Re: [for-next][PATCH 13/29] tracing: No need to free iter->trace in fail path of tracing_open_pipe()
On Wed, 20 Feb 2019 13:37:50 -0500 Steven Rostedt wrote: > From: "zhangyi (F)" > > Commit d716ff71dd12 ("tracing: Remove taking of trace_types_lock in > pipe files") use the current tracer instead of the copy in > tracing_open_pipe(), but it forget to remove the freeing sentence in > the error path. > > [ Note, this is harmless because kfree(NULL) is allowed and iter is > allocated with kzalloc() making iter->trace = NULL -- S. Rostedt ] Bah, I forgot to update this. I haven't pushed to linux-next yet. As Zhangyi replied, this is a real issue. I just wished the real issue was explained in the change log. I'm going to rebase this to update the change log (no code changes, so no need to run the tests again), and also, I'll add a Cc stable. No point in sending this out as a separate patch either, because the merge window is going to open soon. -- Steve > > Link: > http://lkml.kernel.org/r/1550060946-45984-1-git-send-email-yi.zh...@huawei.com > > Fixes: d716ff71dd12 ("tracing: Remove taking of trace_types_lock in pipe > files") > Signed-off-by: zhangyi (F) > Signed-off-by: Steven Rostedt (VMware) > --- > kernel/trace/trace.c | 1 - > 1 file changed, 1 deletion(-) > > diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c > index c521b7347482..b583ff7656bb 100644 > --- a/kernel/trace/trace.c > +++ b/kernel/trace/trace.c > @@ -5624,7 +5624,6 @@ static int tracing_open_pipe(struct inode *inode, > struct file *filp) > return ret; > > fail: > - kfree(iter->trace); > kfree(iter); > __trace_array_put(tr); > mutex_unlock(_types_lock);
Re: [RFC][PATCH 00/16] sched: Core scheduling
On 2/20/19 1:42 AM, Peter Zijlstra wrote: A: Because it messes up the order in which people normally read text. Q: Why is top-posting such a bad thing? A: Top-posting. Q: What is the most annoying thing in e-mail? On Tue, Feb 19, 2019 at 02:07:01PM -0800, Greg Kerr wrote: Thanks for posting this patchset Peter. Based on the patch titled, "sched: A quick and dirty cgroup tagging interface," I believe cgroups are used to define co-scheduling groups in this implementation. Chrome OS engineers (kerr...@google.com, mpden...@google.com, and pal...@google.com) are considering an interface that is usable by unprivileged userspace apps. cgroups are a global resource that require privileged access. Have you considered an interface that is akin to namespaces? Consider the following strawperson API proposal (I understand prctl() is generally used for process specific actions, so we aren't married to using prctl()): I don't think we're anywhere near the point where I care about interfaces with this stuff. Interfaces are a trivial but tedious matter once the rest works to satisfaction. As it happens; there is actually a bug in that very cgroup patch that can cause undesired scheduling. Try spotting and fixing that. Another question is if we want to be L1TF complete (and how strict) or not, and if so, build the missing pieces (for instance we currently don't kick siblings on IRQ/trap/exception entry -- and yes that's nasty and horrible code and missing for that reason). I remember asking Paul about this and he mentioned he has a Address Space Isolation proposal to cover this. So it seems this is out of scope of core scheduling? So first; does this provide what we need? If that's sorted we can bike-shed on uapi/abi.
Re: [PATCH v15 15/15] tracing: Add hist trigger action 'expected fail' test case
On Wed, 2019-02-20 at 13:33 -0500, Steven Rostedt wrote: > On Wed, 20 Feb 2019 12:10:31 -0600 > Tom Zanussi wrote: > > > > > As far as I understand it (there's no other case of an xfail test > > in > > the testsuite, so nothing similar to compare it to), the test > > output is > > correct - here we get the expected fail, XFAIL, and not a FAIL as > > any > > test, xfail or normal, that failed would produce: > > Yeah, I've been staring at the code, and commit: > > 915de2adb584a ftracetest: Add POSIX.3 standard and XFAIL result codes > > > > > > tools/testing/selftests/ftrace# ./ftracetest test.d/trigger/ > > === Ftrace unit tests === > > [1] event trigger - test inter-event histogram trigger expected > > fail actions > > [XFAIL] > > [2] event trigger - test extended error support > > [PASS] > > > > And here the summary shows none failed, while we did have one > > expected > > xfail, but that's what was expected, and not a failure: > > > > # of passed: 31 > > # of failed: 0 > > # of unresolved: 0 > > # of untested: 0 > > # of unsupported: 0 > > # of xfailed: 1 > > Yeah, but it's marked as RED, which is why I thought it was a > failure. > > > # of undefined(test bug): 0 > > > > If that's not correct, I'll fix it but at this point I'm not sure > > what > > the output should be if not that. > > OK, so this has nothing to do with your patch set. I've tested > everything else, and I'm ready to finally push my tree to linux-next. > > I'm thinking that we should get rid of xfail, as it's really > confusing, > and I don't understand its purpose. But that shouldn't stop pushing > your patches. > OK, I'm fine with removing it, if it's too confusing. IIRC Masami suggested it to highlight that not all actions and handlers can be used together, so I guess I'll hold off on a patch removing it until he can chime in... Thanks, Tom > Thanks, > > -- Steve
Re: [PATCH] kasan: turn off asan-stack for clang-8 and earlier
On Wed, Feb 20, 2019 at 10:07:36AM -0800, Nick Desaulniers wrote: > I like Evgenii's idea: > https://bugs.llvm.org/show_bug.cgi?id=38809#c10 That's a suggestion to tune the inlining heuristics. > While I myself share Arnd's goal of driving compiler warnings to zero, > in general I'd prefer not to disable warning-producing-features or > disable warnings outright for cases where we have some ideas of > changes we can make to the compiler. There's probably a list now of > false warnings produced by old versions of Clang from bugs in Clang > that we fixed. I'm not interested in additionally trying to work > around those somehow in kernel sources. We do have infrastructure in the kernel for managing warnings based on compiler version (Arnd was looking at some improvements to that IIRC), if we've got a kernel that builds with a given compiler it's worth looking at tuning what we do with that compiler. If newer versions of the compiler work better or have new options we can turn things on for them. > Qian previously pointed out that most drivers don't produce this > warning under KASAN+Clang. While 114 is a lot, what are the chances > that someone NEEDS a KASAN+Clang build to compile warning free and > happen to include one of these problematic drivers? And if there is a > chance they do observe the warning, are we doing a disservice by > disabling the feature (-asan-stack=1) outright for the whole kernel, > or disabling the warning (`-Wstack-frame-larger-than=`) which can flag > issues unrelated to KASAN? People doing treewide work and subsystem maintainers are a reasonably important target for this sort of thing - for example people looking at the kernelci output. It's a lot easier to pay attention to problems if you don't have to wade through large numbers of false positives. signature.asc Description: PGP signature
[for-next][PATCH 04/29] tracing: Add comment to predicate_parse() about "&&" or "||"
From: "Steven Rostedt (VMware)" As the predicat_parse() code is rather complex, commenting subtleties is important. The switch case statement should be commented to describe that it is only looking for two '&' or '|' together, which is why the fall through to an error is after the check. Signed-off-by: Steven Rostedt (VMware) --- kernel/trace/trace_events_filter.c | 1 + 1 file changed, 1 insertion(+) diff --git a/kernel/trace/trace_events_filter.c b/kernel/trace/trace_events_filter.c index eb694756c4bb..f052ecb085e9 100644 --- a/kernel/trace/trace_events_filter.c +++ b/kernel/trace/trace_events_filter.c @@ -491,6 +491,7 @@ predicate_parse(const char *str, int nr_parens, int nr_preds, break; case '&': case '|': + /* accepting only "&&" or "||" */ if (next[1] == next[0]) { ptr++; break; -- 2.20.1
[for-next][PATCH 01/29] function_graph: Support displaying relative timestamp
From: Changbin Du When function_graph is used for latency tracers, relative timestamp is more straightforward than absolute timestamp as function trace does. This change adds relative timestamp support to function_graph and applies to latency tracers (wakeup and irqsoff). Instead of: # tracer: irqsoff # # irqsoff latency trace v1.1.5 on 5.0.0-rc1-test # # latency: 521 us, #1125/1125, CPU#2 | (M:preempt VP:0, KP:0, SP:0 HP:0 #P:8) #- #| task: swapper/2-0 (uid:0 nice:0 policy:0 rt_prio:0) #- # => started at: __schedule # => ended at: _raw_spin_unlock_irq # # # _-=> irqs-off # / _=> need-resched # | / _---=> hardirq/softirq # || / _--=> preempt-depth # ||| / # TIMECPU TASK/PID DURATION FUNCTION CALLS # | | || | | | | | | 124.974306 | 2) systemd-693 | d..1 0.000 us| __schedule(); 124.974307 | 2) systemd-693 | d..1 | rcu_note_context_switch() { 124.974308 | 2) systemd-693 | d..1 0.487 us| rcu_preempt_deferred_qs(); 124.974309 | 2) systemd-693 | d..1 0.451 us| rcu_qs(); 124.974310 | 2) systemd-693 | d..1 2.301 us|} [..] 124.974826 | 2)-0| d..2 | finish_task_switch() { 124.974826 | 2)-0| d..2 | _raw_spin_unlock_irq() { 124.974827 | 2)-0| d..2 0.000 us| _raw_spin_unlock_irq(); 124.974828 | 2)-0| d..2 0.000 us| tracer_hardirqs_on(); -0 2d..2 552us : => __schedule => schedule_idle => do_idle => cpu_startup_entry => start_secondary => secondary_startup_64 Show: # tracer: irqsoff # # irqsoff latency trace v1.1.5 on 5.0.0-rc1-test+ # # latency: 511 us, #1053/1053, CPU#7 | (M:preempt VP:0, KP:0, SP:0 HP:0 #P:8) #- #| task: swapper/7-0 (uid:0 nice:0 policy:0 rt_prio:0) #- # => started at: __schedule # => ended at: _raw_spin_unlock_irq # # # _-=> irqs-off # / _=> need-resched # | / _---=> hardirq/softirq # || / _--=> preempt-depth # ||| / # REL TIME CPU TASK/PID DURATION FUNCTION CALLS # | | || | | | | | | 0 us | 7) sshd-1704| d..1 0.000 us| __schedule(); 1 us | 7) sshd-1704| d..1 | rcu_note_context_switch() { 1 us | 7) sshd-1704| d..1 0.611 us| rcu_preempt_deferred_qs(); 2 us | 7) sshd-1704| d..1 0.484 us| rcu_qs(); 3 us | 7) sshd-1704| d..1 2.599 us|} [..] 509 us | 7)-0| d..2 | finish_task_switch() { 510 us | 7)-0| d..2 | _raw_spin_unlock_irq() { 510 us | 7)-0| d..2 0.000 us| _raw_spin_unlock_irq(); 512 us | 7)-0| d..2 0.000 us| tracer_hardirqs_on(); -0 7d..2 543us : => __schedule => schedule_idle => do_idle => cpu_startup_entry => start_secondary => secondary_startup_64 Link: http://lkml.kernel.org/r/20190101154614.8887-2-changbin...@gmail.com Signed-off-by: Changbin Du Signed-off-by: Steven Rostedt (VMware) --- kernel/trace/trace.h | 9 + kernel/trace/trace_functions_graph.c | 25 + kernel/trace/trace_irqsoff.c | 2 +- kernel/trace/trace_sched_wakeup.c| 2 +- 4 files changed, 32 insertions(+), 6 deletions(-) diff --git a/kernel/trace/trace.h b/kernel/trace/trace.h index 08900828d282..a34fa5e76abb 100644 --- a/kernel/trace/trace.h +++ b/kernel/trace/trace.h @@ -855,10 +855,11 @@ static __always_inline bool ftrace_hash_empty(struct ftrace_hash *hash) #define TRACE_GRAPH_PRINT_PROC 0x8 #define TRACE_GRAPH_PRINT_DURATION 0x10 #define TRACE_GRAPH_PRINT_ABS_TIME 0x20 -#define TRACE_GRAPH_PRINT_IRQS 0x40 -#define TRACE_GRAPH_PRINT_TAIL 0x80 -#define TRACE_GRAPH_SLEEP_TIME 0x100 -#define TRACE_GRAPH_GRAPH_TIME 0x200 +#define TRACE_GRAPH_PRINT_REL_TIME 0x40 +#define TRACE_GRAPH_PRINT_IRQS 0x80 +#define TRACE_GRAPH_PRINT_TAIL 0x100 +#define TRACE_GRAPH_SLEEP_TIME 0x200 +#define TRACE_GRAPH_GRAPH_TIME 0x400
Re: [PATCH] s390/jump_label: Correct asm contraint
On 2/20/19 12:58 AM, Heiko Carstens wrote: On Sat, Feb 09, 2019 at 12:34:20PM -0800, Laura Abbott wrote: On 2/5/19 12:43 PM, Heiko Carstens wrote: On Tue, Jan 29, 2019 at 08:25:58AM +0100, Laura Abbott wrote: On 1/23/19 5:24 AM, Heiko Carstens wrote: On Wed, Jan 23, 2019 at 01:55:13PM +0100, Laura Abbott wrote: There's a build failure with gcc9: ./arch/s390/include/asm/jump_label.h: Assembler messages: ./arch/s390/include/asm/jump_label.h:23: Error: bad expression ./arch/s390/include/asm/jump_label.h:23: Error: junk at end of line, first unrecognized character is `r' make[1]: *** [scripts/Makefile.build:277: init/main.o] Error 1 ... I've had to turn off s390 in Fedora until this gets fixed :( Laura, the patch below should fix this (temporarily). If possible, could you give it a try? It seems to work for me. Subject: [PATCH] s390: disable section anchors Tested-by: Laura Abbott < The patch won't be used. In the meantime Ilya provided a gcc 9 and kernel patch which should fix this. The kernel patch is available here https://git.kernel.org/pub/scm/linux/kernel/git/s390/linux.git/commit/?h=features=146448524bddbf6dfc62de31957e428de001cbda and will go upstream during the next merge window. Note: this obviously also requires to update the gcc 9 version in Fedora, so it contains Ilya's patch, to be able to compile the kernel. Thanks, Heiko Thanks. I'll keep an eye out for that during the next merge window.
[for-next][PATCH 00/29] tracing: Updates for 5.1
git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace.git for-next Head SHA1: 5308e9705d9a017f0e732610ac0a7cab52fb01f7 Changbin Du (6): function_graph: Support displaying relative timestamp tracing: Show more info for funcgraph wakeup tracers tracing: Put a margin between flags and duration for wakeup tracers tracing/doc: Add latency tracer funcgraph example tracing: Show stacktrace for wakeup tracers tracing: Change the function format to display function names by perf Elena Reshetova (1): uprobes: convert uprobe.ref to refcount_t Mathieu Malaterre (2): tracing: Annotate implicit fall through in parse_probe_arg() tracing: Annotate implicit fall through in predicate_parse() Miroslav Benes (1): ring-buffer: Remove unused function ring_buffer_page_len() Steven Rostedt (VMware) (3): tracing: Add comment to predicate_parse() about "&&" or "||" ftrace: Allow enabling of filters via index of available_filter_functions tracing: Comment why cond_snapshot is checked outside of max_lock protection Tom Zanussi (15): tracing: Refactor hist trigger action code tracing: Make hist trigger Documentation better reflect actions/handlers tracing: Split up onmatch action data tracing: Generalize hist trigger onmax and save action tracing: Add conditional snapshot tracing: Add hist trigger snapshot() action tracing: Add hist trigger snapshot() action Documentation tracing: Add hist trigger onchange() handler tracing: Add hist trigger onchange() handler Documentation tracing: Add alternative synthetic event trace action syntax tracing: Add SPDX license GPL-2.0 license identifier to inter-event testcases tracing: Add hist trigger snapshot() action test case tracing: Add hist trigger onchange() handler test case tracing: Add alternative synthetic event trace action test case tracing: Add hist trigger action 'expected fail' test case zhangyi (F) (1): tracing: No need to free iter->trace in fail path of tracing_open_pipe() Documentation/trace/ftrace.rst | 89 ++ Documentation/trace/histogram.rst | 316 +- include/linux/ring_buffer.h|2 - kernel/events/uprobes.c|8 +- kernel/trace/ftrace.c | 30 + kernel/trace/ring_buffer.c | 14 - kernel/trace/trace.c | 217 +++- kernel/trace/trace.h | 66 +- kernel/trace/trace_entries.h | 41 +- kernel/trace/trace_events_filter.c |7 + kernel/trace/trace_events_hist.c | 1048 ++-- kernel/trace/trace_functions_graph.c | 30 +- kernel/trace/trace_irqsoff.c |2 +- kernel/trace/trace_probe.c |1 + kernel/trace/trace_sched_wakeup.c | 11 +- .../inter-event/trigger-action-hist-xfail.tc | 30 + .../inter-event/trigger-extended-error-support.tc |1 + .../inter-event/trigger-field-variable-support.tc |1 + .../trigger-inter-event-combined-hist.tc |1 + .../inter-event/trigger-multi-actions-accept.tc|1 + .../inter-event/trigger-onchange-action-hist.tc| 28 + .../inter-event/trigger-onmatch-action-hist.tc |1 + .../trigger-onmatch-onmax-action-hist.tc |1 + .../inter-event/trigger-onmax-action-hist.tc |1 + .../inter-event/trigger-snapshot-action-hist.tc| 43 + .../trigger-synthetic-event-createremove.tc|1 + .../inter-event/trigger-trace-action-hist.tc | 42 + 27 files changed, 1659 insertions(+), 374 deletions(-) create mode 100644 tools/testing/selftests/ftrace/test.d/trigger/inter-event/trigger-action-hist-xfail.tc create mode 100644 tools/testing/selftests/ftrace/test.d/trigger/inter-event/trigger-onchange-action-hist.tc create mode 100644 tools/testing/selftests/ftrace/test.d/trigger/inter-event/trigger-snapshot-action-hist.tc create mode 100644 tools/testing/selftests/ftrace/test.d/trigger/inter-event/trigger-trace-action-hist.tc
[for-next][PATCH 05/29] tracing: Show more info for funcgraph wakeup tracers
From: Changbin Du Add these info fields to funcgraph wakeup tracers: o Show CPU info since the waker could be on a different CPU. o Show function duration and overhead. o Show IRQ markers. Link: http://lkml.kernel.org/r/20190101154614.8887-3-changbin...@gmail.com Signed-off-by: Changbin Du Signed-off-by: Steven Rostedt (VMware) --- kernel/trace/trace_sched_wakeup.c | 5 - 1 file changed, 4 insertions(+), 1 deletion(-) diff --git a/kernel/trace/trace_sched_wakeup.c b/kernel/trace/trace_sched_wakeup.c index b6c5fa10347e..da5b6e012840 100644 --- a/kernel/trace/trace_sched_wakeup.c +++ b/kernel/trace/trace_sched_wakeup.c @@ -180,8 +180,11 @@ static void wakeup_trace_close(struct trace_iterator *iter) } #define GRAPH_TRACER_FLAGS (TRACE_GRAPH_PRINT_PROC | \ + TRACE_GRAPH_PRINT_CPU | \ TRACE_GRAPH_PRINT_REL_TIME | \ - TRACE_GRAPH_PRINT_DURATION) + TRACE_GRAPH_PRINT_DURATION | \ + TRACE_GRAPH_PRINT_OVERHEAD | \ + TRACE_GRAPH_PRINT_IRQS) static enum print_line_t wakeup_print_line(struct trace_iterator *iter) { -- 2.20.1
[for-next][PATCH 12/29] uprobes: convert uprobe.ref to refcount_t
From: Elena Reshetova atomic_t variables are currently used to implement reference counters with the following properties: - counter is initialized to 1 using atomic_set() - a resource is freed upon counter reaching zero - once counter reaches zero, its further increments aren't allowed - counter schema uses basic atomic operations (set, inc, inc_not_zero, dec_and_test, etc.) Such atomic variables should be converted to a newly provided refcount_t type and API that prevents accidental counter overflows and underflows. This is important since overflows and underflows can lead to use-after-free situation and be exploitable. The variable uprobe.ref is used as pure reference counter. Convert it to refcount_t and fix up the operations. **Important note for maintainers: Some functions from refcount_t API defined in lib/refcount.c have different memory ordering guarantees than their atomic counterparts. The full comparison can be seen in https://lkml.org/lkml/2017/11/15/57 and it is hopefully soon in state to be merged to the documentation tree. Normally the differences should not matter since refcount_t provides enough guarantees to satisfy the refcounting use cases, but in some rare cases it might matter. Please double check that you don't have some undocumented memory guarantees for this variable usage. For the uprobe.ref it might make a difference in following places: - put_uprobe(): decrement in refcount_dec_and_test() only provides RELEASE ordering and control dependency on success vs. fully ordered atomic counterpart Link: http://lkml.kernel.org/r/1547637627-29526-1-git-send-email-elena.reshet...@intel.com Suggested-by: Kees Cook Acked-by: Oleg Nesterov Reviewed-by: David Windsor Reviewed-by: Hans Liljestrand Reviewed-by: Srikar Dronamraju Signed-off-by: Elena Reshetova Signed-off-by: Steven Rostedt (VMware) --- kernel/events/uprobes.c | 8 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/kernel/events/uprobes.c b/kernel/events/uprobes.c index 8aef47ee7bfa..0a8bf7a4fc5e 100644 --- a/kernel/events/uprobes.c +++ b/kernel/events/uprobes.c @@ -66,7 +66,7 @@ static struct percpu_rw_semaphore dup_mmap_sem; struct uprobe { struct rb_node rb_node;/* node in the rb tree */ - atomic_tref; + refcount_t ref; struct rw_semaphore register_rwsem; struct rw_semaphore consumer_rwsem; struct list_headpending_list; @@ -560,13 +560,13 @@ set_orig_insn(struct arch_uprobe *auprobe, struct mm_struct *mm, unsigned long v static struct uprobe *get_uprobe(struct uprobe *uprobe) { - atomic_inc(>ref); + refcount_inc(>ref); return uprobe; } static void put_uprobe(struct uprobe *uprobe) { - if (atomic_dec_and_test(>ref)) { + if (refcount_dec_and_test(>ref)) { /* * If application munmap(exec_vma) before uprobe_unregister() * gets called, we don't get a chance to remove uprobe from @@ -657,7 +657,7 @@ static struct uprobe *__insert_uprobe(struct uprobe *uprobe) rb_link_node(>rb_node, parent, p); rb_insert_color(>rb_node, _tree); /* get access + creation ref */ - atomic_set(>ref, 2); + refcount_set(>ref, 2); return u; } -- 2.20.1
[for-next][PATCH 15/29] tracing: Make hist trigger Documentation better reflect actions/handlers
From: Tom Zanussi The action/handler code refactoring didn't change the action/handler syntax, but did generalize it - the Documentation should reflect that. Link: http://lkml.kernel.org/r/c2fe4144678829c70cad67aaa847dca27d57cb83.1550100284.git.tom.zanu...@linux.intel.com Signed-off-by: Tom Zanussi Signed-off-by: Steven Rostedt (VMware) --- Documentation/trace/histogram.rst | 56 --- 1 file changed, 43 insertions(+), 13 deletions(-) diff --git a/Documentation/trace/histogram.rst b/Documentation/trace/histogram.rst index 7dda76503127..63e522107e59 100644 --- a/Documentation/trace/histogram.rst +++ b/Documentation/trace/histogram.rst @@ -25,7 +25,7 @@ Documentation written by Tom Zanussi hist:keys=[:values=] [:sort=][:size=#entries][:pause][:continue] - [:clear][:name=histname1] [if ] + [:clear][:name=histname1][:.] [if ] When a matching event is hit, an entry is added to a hash table using the key(s) and value(s) named. Keys and values correspond to @@ -1831,21 +1831,51 @@ and looks and behaves just like any other event:: Like any other event, once a histogram is enabled for the event, the output can be displayed by reading the event's 'hist' file. -2.2.3 Hist trigger 'actions' - +2.2.3 Hist trigger 'handlers' and 'actions' +--- -A hist trigger 'action' is a function that's executed whenever a -histogram entry is added or updated. +A hist trigger 'action' is a function that's executed (in most cases +conditionally) whenever a histogram entry is added or updated. -The default 'action' if no special function is explicitly specified is -as it always has been, to simply update the set of values associated -with an entry. Some applications, however, may want to perform -additional actions at that point, such as generate another event, or -compare and save a maximum. +When a histogram entry is added or updated, a hist trigger 'handler' +is what decides whether the corresponding action is actually invoked +or not. -The following additional actions are available. To specify an action -for a given event, simply specify the action between colons in the -hist trigger specification. +Hist trigger handlers and actions are paired together in the general +form: + + . + +To specify a handler.action pair for a given event, simply specify +that handler.action pair between colons in the hist trigger +specification. + +In theory, any handler can be combined with any action, but in +practice, not every handler.action combination is currently supported; +if a given handler.action combination isn't supported, the hist +trigger will fail with -EINVAL; + +The default 'handler.action' if none is explicity specified is as it +always has been, to simply update the set of values associated with an +entry. Some applications, however, may want to perform additional +actions at that point, such as generate another event, or compare and +save a maximum. + +The supported handlers and actions are listed below, and each is +described in more detail in the following paragraphs, in the context +of descriptions of some common and useful handler.action combinations. + +The available handlers are: + + - onmatch(matching.event)- invoke action on any addition or update + - onmax(var) - invoke action if var exceeds current max + +The available actions are: + + - (param list) - generate synthetic event + - save(field,...)- save current event fields + +The following commonly-used handler.action pairs are available: - onmatch(matching.event).(param list) -- 2.20.1
[for-next][PATCH 02/29] tracing: Annotate implicit fall through in parse_probe_arg()
From: Mathieu Malaterre There is a plan to build the kernel with -Wimplicit-fallthrough and this place in the code produced a warning (W=1). This commit remove the following warning: kernel/trace/trace_probe.c:302:6: warning: this statement may fall through [-Wimplicit-fallthrough=] Link: http://lkml.kernel.org/r/20190114203039.16535-1-ma...@debian.org Signed-off-by: Mathieu Malaterre Signed-off-by: Steven Rostedt (VMware) --- kernel/trace/trace_probe.c | 1 + 1 file changed, 1 insertion(+) diff --git a/kernel/trace/trace_probe.c b/kernel/trace/trace_probe.c index 9962cb5da8ac..89da34b326e3 100644 --- a/kernel/trace/trace_probe.c +++ b/kernel/trace/trace_probe.c @@ -300,6 +300,7 @@ parse_probe_arg(char *arg, const struct fetch_type *type, case '+': /* deref memory */ arg++; /* Skip '+', because kstrtol() rejects it. */ + /* fall through */ case '-': tmp = strchr(arg, '('); if (!tmp) -- 2.20.1
[for-next][PATCH 07/29] tracing/doc: Add latency tracer funcgraph example
From: Changbin Du This add an example about how to use funcgraph with latency tracers. Link: http://lkml.kernel.org/r/20190101154614.8887-6-changbin...@gmail.com Signed-off-by: Changbin Du Signed-off-by: Steven Rostedt (VMware) --- Documentation/trace/ftrace.rst | 51 ++ 1 file changed, 51 insertions(+) diff --git a/Documentation/trace/ftrace.rst b/Documentation/trace/ftrace.rst index 0131df7f5968..6ce2763a2a3e 100644 --- a/Documentation/trace/ftrace.rst +++ b/Documentation/trace/ftrace.rst @@ -1396,6 +1396,57 @@ enabling function tracing, we incur an added overhead. This overhead may extend the latency times. But nevertheless, this trace has provided some very helpful debugging information. +If we prefer function graph output instead of function, we can set +display-graph option:: + with echo 1 > options/display-graph + + # tracer: irqsoff + # + # irqsoff latency trace v1.1.5 on 4.20.0-rc6+ + # + # latency: 3751 us, #274/274, CPU#0 | (M:desktop VP:0, KP:0, SP:0 HP:0 #P:4) + #- + #| task: bash-1507 (uid:0 nice:0 policy:0 rt_prio:0) + #- + # => started at: free_debug_processing + # => ended at: return_to_handler + # + # + # _-=> irqs-off + # / _=> need-resched + # | / _---=> hardirq/softirq + # || / _--=> preempt-depth + # ||| / + # REL TIME CPU TASK/PID DURATION FUNCTION CALLS + # | | || | | | | | | + 0 us | 0) bash-1507| d... | 0.000 us| _raw_spin_lock_irqsave(); + 0 us | 0) bash-1507| d..1 | 0.378 us| do_raw_spin_trylock(); + 1 us | 0) bash-1507| d..2 | |set_track() { + 2 us | 0) bash-1507| d..2 | | save_stack_trace() { + 2 us | 0) bash-1507| d..2 | | __save_stack_trace() { + 3 us | 0) bash-1507| d..2 | | __unwind_start() { + 3 us | 0) bash-1507| d..2 | | get_stack_info() { + 3 us | 0) bash-1507| d..2 | 0.351 us| in_task_stack(); + 4 us | 0) bash-1507| d..2 | 1.107 us|} + [...] + 3750 us | 0) bash-1507| d..1 | 0.516 us| do_raw_spin_unlock(); + 3750 us | 0) bash-1507| d..1 | 0.000 us| _raw_spin_unlock_irqrestore(); + 3764 us | 0) bash-1507| d..1 | 0.000 us| tracer_hardirqs_on(); + bash-15070d..1 3792us : + => free_debug_processing + => __slab_free + => kmem_cache_free + => vm_area_free + => remove_vma + => exit_mmap + => mmput + => flush_old_exec + => load_elf_binary + => search_binary_handler + => __do_execve_file.isra.32 + => __x64_sys_execve + => do_syscall_64 + => entry_SYSCALL_64_after_hwframe preemptoff -- -- 2.20.1
[for-next][PATCH 06/29] tracing: Put a margin between flags and duration for wakeup tracers
From: Changbin Du Don't mix context flags with function duration info. Instead of this: # tracer: wakeup_rt # # wakeup_rt latency trace v1.1.5 on 5.0.0-rc1-test+ # # latency: 177 us, #545/545, CPU#0 | (M:preempt VP:0, KP:0, SP:0 HP:0 #P:8) #- #| task: migration/0-11 (uid:0 nice:0 policy:1 rt_prio:99) #- # # _-=> irqs-off # / _=> need-resched # | / _---=> hardirq/softirq # || / _--=> preempt-depth # ||| / # REL TIME CPU TASK/PID DURATION FUNCTION CALLS # | | || | | | | | | 0 us | 0)-0| dNh5 | /* 0:120:R + [000]11: 0:R migration/0 */ 2 us | 0)-0| dNh5 0.000 us|(null)(); 4 us | 0)-0| dNh4 | _raw_spin_unlock() { 4 us | 0)-0| dNh4 0.304 us| preempt_count_sub(); 5 us | 0)-0| dNh3 1.063 us| } 5 us | 0)-0| dNh3 0.266 us| ttwu_stat(); 6 us | 0)-0| dNh3 | _raw_spin_unlock_irqrestore() { 6 us | 0)-0| dNh3 0.273 us| preempt_count_sub(); 6 us | 0)-0| dNh2 0.818 us| } Show this: # tracer: wakeup # # wakeup latency trace v1.1.5 on 4.20.0+ # # latency: 593 us, #674/674, CPU#0 | (M:desktop VP:0, KP:0, SP:0 HP:0 #P:4) #- #| task: kworker/0:1H-339 (uid:0 nice:-20 policy:0 rt_prio:0) #- # # _-=> irqs-off # / _=> need-resched #| / _---=> hardirq/softirq #|| / _--=> preempt-depth #||| / # REL TIME CPU TASK/PID DURATION FUNCTION CALLS # | | || | | | | | | 0 us | 0)-0| dNs. | | /* 0:120:R + [000] 339:100:R kworker/0:1H */ 3 us | 0)-0| dNs. | 0.000 us| (null)(); 67 us | 0)-0| dNs. | 0.721 us| ttwu_stat(); 69 us | 0)-0| dNs. | 0.607 us| _raw_spin_unlock_irqrestore(); 71 us | 0)-0| .Ns. | 0.598 us| _raw_spin_lock_irq(); 72 us | 0)-0| .Ns. | 0.584 us| _raw_spin_lock_irq(); 73 us | 0)-0| dNs. | + 11.118 us | __next_timer_interrupt(); 75 us | 0)-0| dNs. | | call_timer_fn() { 76 us | 0)-0| dNs. | | delayed_work_timer_fn() { 76 us | 0)-0| dNs. | | __queue_work() { ... Link: http://lkml.kernel.org/r/20190101154614.8887-4-changbin...@gmail.com Signed-off-by: Changbin Du Signed-off-by: Steven Rostedt (VMware) --- kernel/trace/trace_functions_graph.c | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/kernel/trace/trace_functions_graph.c b/kernel/trace/trace_functions_graph.c index 16ebbdd7b22e..69ebf3c2f1b5 100644 --- a/kernel/trace/trace_functions_graph.c +++ b/kernel/trace/trace_functions_graph.c @@ -380,6 +380,7 @@ static void print_graph_lat_fmt(struct trace_seq *s, struct trace_entry *entry) { trace_seq_putc(s, ' '); trace_print_lat_fmt(s, entry); + trace_seq_puts(s, " | "); } /* If the pid changed since the last trace, output this event */ @@ -1153,7 +1154,7 @@ static void __print_graph_headers_flags(struct trace_array *tr, if (flags & TRACE_GRAPH_PRINT_PROC) seq_puts(s, " TASK/PID "); if (lat) - seq_puts(s, ""); + seq_puts(s, " "); if (flags & TRACE_GRAPH_PRINT_DURATION) seq_puts(s, " DURATION "); seq_puts(s, " FUNCTION CALLS\n"); @@ -1169,7 +1170,7 @@ static void __print_graph_headers_flags(struct trace_array *tr, if (flags & TRACE_GRAPH_PRINT_PROC) seq_puts(s, " ||"); if (lat) - seq_puts(s, ""); + seq_puts(s, " "); if (flags & TRACE_GRAPH_PRINT_DURATION) seq_puts(s, " | | "); seq_puts(s, " | | | |\n"); -- 2.20.1
[for-next][PATCH 08/29] tracing: Show stacktrace for wakeup tracers
From: Changbin Du This align the behavior of wakeup tracers with irqsoff latency tracer that we record stacktrace at the beginning and end of waking up. The stacktrace shows us what is happening in the kernel. Link: http://lkml.kernel.org/r/20190116160249.7554-1-changbin...@gmail.com Signed-off-by: Changbin Du Signed-off-by: Steven Rostedt (VMware) --- kernel/trace/trace_sched_wakeup.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/kernel/trace/trace_sched_wakeup.c b/kernel/trace/trace_sched_wakeup.c index da5b6e012840..f4fe7d1781e9 100644 --- a/kernel/trace/trace_sched_wakeup.c +++ b/kernel/trace/trace_sched_wakeup.c @@ -475,6 +475,7 @@ probe_wakeup_sched_switch(void *ignore, bool preempt, __trace_function(wakeup_trace, CALLER_ADDR0, CALLER_ADDR1, flags, pc); tracing_sched_switch_trace(wakeup_trace, prev, next, flags, pc); + __trace_stack(wakeup_trace, flags, 0, pc); T0 = data->preempt_timestamp; T1 = ftrace_now(cpu); @@ -586,6 +587,7 @@ probe_wakeup(void *ignore, struct task_struct *p) data = per_cpu_ptr(wakeup_trace->trace_buffer.data, wakeup_cpu); data->preempt_timestamp = ftrace_now(cpu); tracing_sched_wakeup_trace(wakeup_trace, p, current, flags, pc); + __trace_stack(wakeup_trace, flags, 0, pc); /* * We must be careful in using CALLER_ADDR2. But since wake_up -- 2.20.1
[for-next][PATCH 20/29] tracing: Add hist trigger snapshot() action Documentation
From: Tom Zanussi Add Documentation for the hist:handlerXXX($var).snapshot() action. Link: http://lkml.kernel.org/r/445861d7822cd4b6aeaea1cecfcdbda466502148.1550100284.git.tom.zanu...@linux.intel.com Signed-off-by: Tom Zanussi Signed-off-by: Steven Rostedt (VMware) --- Documentation/trace/histogram.rst | 110 ++ 1 file changed, 110 insertions(+) diff --git a/Documentation/trace/histogram.rst b/Documentation/trace/histogram.rst index 63e522107e59..353317bc3825 100644 --- a/Documentation/trace/histogram.rst +++ b/Documentation/trace/histogram.rst @@ -1874,6 +1874,7 @@ The available actions are: - (param list) - generate synthetic event - save(field,...)- save current event fields + - snapshot() - snapshot the trace buffer The following commonly-used handler.action pairs are available: @@ -2030,6 +2031,115 @@ The following commonly-used handler.action pairs are available: Entries: 2 Dropped: 0 + - onmax(var).snapshot() + +The 'onmax(var).snapshot()' hist trigger action is invoked +whenever the value of 'var' associated with a histogram entry +exceeds the current maximum contained in that variable. + +The end result is that a global snapshot of the trace buffer will +be saved in the tracing/snapshot file if 'var' exceeds the current +maximum for any hist trigger entry. + +Note that in this case the maximum is a global maximum for the +current trace instance, which is the maximum across all buckets of +the histogram. The key of the specific trace event that caused +the global maximum and the global maximum itself are displayed, +along with a message stating that a snapshot has been taken and +where to find it. The user can use the key information displayed +to locate the corresponding bucket in the histogram for even more +detail. + +As an example the below defines a couple of hist triggers, one for +sched_waking and another for sched_switch, keyed on pid. Whenever +a sched_waking event occurs, the timestamp is saved in the entry +corresponding to the current pid, and when the scheduler switches +back to that pid, the timestamp difference is calculated. If the +resulting latency, stored in wakeup_lat, exceeds the current +maximum latency, a snapshot is taken. As part of the setup, all +the scheduler events are also enabled, which are the events that +will show up in the snapshot when it is taken at some point: + +# echo 1 > /sys/kernel/debug/tracing/events/sched/enable + +# echo 'hist:keys=pid:ts0=common_timestamp.usecs \ +if comm=="cyclictest"' >> \ +/sys/kernel/debug/tracing/events/sched/sched_waking/trigger + +# echo 'hist:keys=next_pid:wakeup_lat=common_timestamp.usecs-$ts0: \ +onmax($wakeup_lat).save(next_prio,next_comm,prev_pid,prev_prio, \ + prev_comm):onmax($wakeup_lat).snapshot() \ + if next_comm=="cyclictest"' >> \ + /sys/kernel/debug/tracing/events/sched/sched_switch/trigger + +When the histogram is displayed, for each bucket the max value +and the saved values corresponding to the max are displayed +following the rest of the fields. + +If a snaphot was taken, there is also a message indicating that, +along with the value and event that triggered the global maximum: + +# cat /sys/kernel/debug/tracing/events/sched/sched_switch/hist + { next_pid: 2101 } hitcount:200 + max: 52 next_prio:120 next_comm: cyclictest \ +prev_pid: 0 prev_prio:120 prev_comm: swapper/6 + + { next_pid: 2103 } hitcount: 1326 + max:572 next_prio: 19 next_comm: cyclictest \ +prev_pid: 0 prev_prio:120 prev_comm: swapper/1 + + { next_pid: 2102 } hitcount: 1982 \ + max: 74 next_prio: 19 next_comm: cyclictest \ +prev_pid: 0 prev_prio:120 prev_comm: swapper/5 + +Snapshot taken (see tracing/snapshot). Details: + triggering value { onmax($wakeup_lat) }:572 \ + triggered by event with key: { next_pid: 2103 } + +Totals: +Hits: 3508 +Entries: 3 +Dropped: 0 + +In the above case, the event that triggered the global maximum has +the key with next_pid == 2103. If you look at the bucket that has +2103 as the key, you'll find the additional values save()'d along +with the local maximum for that bucket, which should be the same +as the global maximum (since that was the same value that +triggered the global snapshot). + +And finally, looking at the snapshot data should show at or near +the end the event that triggered the snapshot (in this case you +can verify the timestamps between the sched_waking and +
[for-next][PATCH 13/29] tracing: No need to free iter->trace in fail path of tracing_open_pipe()
From: "zhangyi (F)" Commit d716ff71dd12 ("tracing: Remove taking of trace_types_lock in pipe files") use the current tracer instead of the copy in tracing_open_pipe(), but it forget to remove the freeing sentence in the error path. [ Note, this is harmless because kfree(NULL) is allowed and iter is allocated with kzalloc() making iter->trace = NULL -- S. Rostedt ] Link: http://lkml.kernel.org/r/1550060946-45984-1-git-send-email-yi.zh...@huawei.com Fixes: d716ff71dd12 ("tracing: Remove taking of trace_types_lock in pipe files") Signed-off-by: zhangyi (F) Signed-off-by: Steven Rostedt (VMware) --- kernel/trace/trace.c | 1 - 1 file changed, 1 deletion(-) diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c index c521b7347482..b583ff7656bb 100644 --- a/kernel/trace/trace.c +++ b/kernel/trace/trace.c @@ -5624,7 +5624,6 @@ static int tracing_open_pipe(struct inode *inode, struct file *filp) return ret; fail: - kfree(iter->trace); kfree(iter); __trace_array_put(tr); mutex_unlock(_types_lock); -- 2.20.1
[for-next][PATCH 18/29] tracing: Add conditional snapshot
From: Tom Zanussi Currently, tracing snapshots are context-free - they capture the ring buffer contents at the time the tracing_snapshot() function was invoked, and nothing else. Additionally, they're always taken unconditionally - the calling code can decide whether or not to take a snapshot, but the data used to make that decision is kept separately from the snapshot itself. This change adds the ability to associate with each trace instance some user data, along with an 'update' function that can use that data to determine whether or not to actually take a snapshot. The update function can then update that data along with any other state (as part of the data presumably), if warranted. Because snapshots are 'global' per-instance, only one user can enable and use a conditional snapshot for any given trace instance. To enable a conditional snapshot (see details in the function and data structure comments), the user calls tracing_snapshot_cond_enable(). Similarly, to disable a conditional snapshot and free it up for other users, tracing_snapshot_cond_disable() should be called. To actually initiate a conditional snapshot, tracing_snapshot_cond() should be called. tracing_snapshot_cond() will invoke the update() callback, allowing the user to decide whether or not to actually take the snapshot and update the user-defined data associated with the snapshot. If the callback returns 'true', tracing_snapshot_cond() will then actually take the snapshot and return. This scheme allows for flexibility in snapshot implementations - for example, by implementing slightly different update() callbacks, snapshots can be taken in situations where the user is only interested in taking a snapshot when a new maximum in hit versus when a value changes in any way at all. Future patches will demonstrate both cases. Link: http://lkml.kernel.org/r/1bea07828d5fd6864a585f83b1eed47ce097eb45.1550100284.git.tom.zanu...@linux.intel.com Signed-off-by: Tom Zanussi Signed-off-by: Steven Rostedt (VMware) --- kernel/trace/trace.c | 192 +- kernel/trace/trace.h | 56 - kernel/trace/trace_sched_wakeup.c | 2 +- 3 files changed, 244 insertions(+), 6 deletions(-) diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c index b477926ac3bc..9f4d56f74b46 100644 --- a/kernel/trace/trace.c +++ b/kernel/trace/trace.c @@ -894,7 +894,7 @@ int __trace_bputs(unsigned long ip, const char *str) EXPORT_SYMBOL_GPL(__trace_bputs); #ifdef CONFIG_TRACER_SNAPSHOT -void tracing_snapshot_instance(struct trace_array *tr) +void tracing_snapshot_instance_cond(struct trace_array *tr, void *cond_data) { struct tracer *tracer = tr->current_trace; unsigned long flags; @@ -920,10 +920,15 @@ void tracing_snapshot_instance(struct trace_array *tr) } local_irq_save(flags); - update_max_tr(tr, current, smp_processor_id()); + update_max_tr(tr, current, smp_processor_id(), cond_data); local_irq_restore(flags); } +void tracing_snapshot_instance(struct trace_array *tr) +{ + tracing_snapshot_instance_cond(tr, NULL); +} + /** * tracing_snapshot - take a snapshot of the current buffer. * @@ -946,6 +951,54 @@ void tracing_snapshot(void) } EXPORT_SYMBOL_GPL(tracing_snapshot); +/** + * tracing_snapshot_cond - conditionally take a snapshot of the current buffer. + * @tr:The tracing instance to snapshot + * @cond_data: The data to be tested conditionally, and possibly saved + * + * This is the same as tracing_snapshot() except that the snapshot is + * conditional - the snapshot will only happen if the + * cond_snapshot.update() implementation receiving the cond_data + * returns true, which means that the trace array's cond_snapshot + * update() operation used the cond_data to determine whether the + * snapshot should be taken, and if it was, presumably saved it along + * with the snapshot. + */ +void tracing_snapshot_cond(struct trace_array *tr, void *cond_data) +{ + tracing_snapshot_instance_cond(tr, cond_data); +} +EXPORT_SYMBOL_GPL(tracing_snapshot_cond); + +/** + * tracing_snapshot_cond_data - get the user data associated with a snapshot + * @tr:The tracing instance + * + * When the user enables a conditional snapshot using + * tracing_snapshot_cond_enable(), the user-defined cond_data is saved + * with the snapshot. This accessor is used to retrieve it. + * + * Should not be called from cond_snapshot.update(), since it takes + * the tr->max_lock lock, which the code calling + * cond_snapshot.update() has already done. + * + * Returns the cond_data associated with the trace array's snapshot. + */ +void *tracing_cond_snapshot_data(struct trace_array *tr) +{ + void *cond_data = NULL; + + arch_spin_lock(>max_lock); + + if (tr->cond_snapshot) + cond_data = tr->cond_snapshot->cond_data; + + arch_spin_unlock(>max_lock); + + return cond_data; +}
[for-next][PATCH 21/29] tracing: Add hist trigger onchange() handler
From: Tom Zanussi Add support for a hist:onchange($var) handler, similar to the onmax() handler but triggering whenever there's any change in $var, not just a max. Link: http://lkml.kernel.org/r/dfbc7e4ada242603e9ec3f049b5ad076a07dfd03.1550100284.git.tom.zanu...@linux.intel.com Signed-off-by: Tom Zanussi Signed-off-by: Steven Rostedt (VMware) --- kernel/trace/trace.c | 3 +- kernel/trace/trace_events_hist.c | 58 +++- 2 files changed, 52 insertions(+), 9 deletions(-) diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c index dd60c14a0fb0..be6779f963c6 100644 --- a/kernel/trace/trace.c +++ b/kernel/trace/trace.c @@ -4915,7 +4915,8 @@ static const char readme_msg[] = "\t.\n\n" "\tThe available handlers are:\n\n" "\tonmatch(matching.event) - invoke on addition or update\n" - "\tonmax(var) - invoke if var exceeds current max\n\n" + "\tonmax(var) - invoke if var exceeds current max\n" + "\tonchange(var)- invoke action if var changes\n\n" "\tThe available actions are:\n\n" "\t(param list)- generate synthetic event\n" "\tsave(field,...) - save current event fields\n" diff --git a/kernel/trace/trace_events_hist.c b/kernel/trace/trace_events_hist.c index 571937a268a3..2f3323ca9d24 100644 --- a/kernel/trace/trace_events_hist.c +++ b/kernel/trace/trace_events_hist.c @@ -391,6 +391,7 @@ typedef bool (*check_track_val_fn_t) (u64 track_val, u64 var_val); enum handler_id { HANDLER_ONMATCH = 1, HANDLER_ONMAX, + HANDLER_ONCHANGE, }; enum action_id { @@ -1989,7 +1990,8 @@ static int parse_action(char *str, struct hist_trigger_attrs *attrs) return ret; if ((str_has_prefix(str, "onmatch(")) || - (str_has_prefix(str, "onmax("))) { + (str_has_prefix(str, "onmax(")) || + (str_has_prefix(str, "onchange("))) { attrs->action_str[attrs->n_actions] = kstrdup(str, GFP_KERNEL); if (!attrs->action_str[attrs->n_actions]) { ret = -ENOMEM; @@ -3481,6 +3483,14 @@ static bool check_track_val_max(u64 track_val, u64 var_val) return true; } +static bool check_track_val_changed(u64 track_val, u64 var_val) +{ + if (var_val == track_val) + return false; + + return true; +} + static u64 get_track_val(struct hist_trigger_data *hist_data, struct tracing_map_elt *elt, struct action_data *data) @@ -3640,6 +3650,8 @@ static void track_data_print(struct seq_file *m, if (data->handler == HANDLER_ONMAX) seq_printf(m, "\n\tmax: %10llu", track_val); + else if (data->handler == HANDLER_ONCHANGE) + seq_printf(m, "\n\tchanged: %10llu", track_val); if (data->action == ACTION_SNAPSHOT) return; @@ -3727,14 +3739,14 @@ static int track_data_create(struct hist_trigger_data *hist_data, track_data_var_str = data->track_data.var_str; if (track_data_var_str[0] != '$') { - hist_err("For onmax(x), x must be a variable: ", track_data_var_str); + hist_err("For onmax(x) or onchange(x), x must be a variable: ", track_data_var_str); return -EINVAL; } track_data_var_str++; var_field = find_target_event_var(hist_data, NULL, NULL, track_data_var_str); if (!var_field) { - hist_err("Couldn't find onmax variable: ", track_data_var_str); + hist_err("Couldn't find onmax or onchange variable: ", track_data_var_str); return -EINVAL; } @@ -3751,6 +3763,14 @@ static int track_data_create(struct hist_trigger_data *hist_data, ret = PTR_ERR(track_var); goto out; } + + if (data->handler == HANDLER_ONCHANGE) + track_var = create_var(hist_data, file, "__change", sizeof(u64), "u64"); + if (IS_ERR(track_var)) { + hist_err("Couldn't create onchange variable: ", "__change"); + ret = PTR_ERR(track_var); + goto out; + } data->track_data.track_var = track_var; ret = action_create(hist_data, data); @@ -3830,6 +3850,8 @@ static int action_parse(char *str, struct action_data *data, if (handler == HANDLER_ONMAX) data->track_data.check_val = check_track_val_max; + else if (handler == HANDLER_ONCHANGE) + data->track_data.check_val = check_track_val_changed; else { hist_err("action parsing: Handler doesn't support action: ", action_name); ret = -EINVAL; @@ -3850,6 +3872,8 @@ static int action_parse(char *str, struct action_data
[for-next][PATCH 19/29] tracing: Add hist trigger snapshot() action
From: Tom Zanussi Add support for hist:handlerXXX($var).snapshot(), which will take a snapshot of the current trace buffer whenever handlerXXX is hit. As a first user, this also adds snapshot() action support for the onmax() handler i.e. hist:onmax($var).snapshot(). Also, the hist trigger key printing is moved into a separate function so the snapshot() action can print a histogram key outside the histogram display - add and use hist_trigger_print_key() for that purpose. Link: http://lkml.kernel.org/r/2f1a952c0dcd8aca8702ce81269581a692396d45.1550100284.git.tom.zanu...@linux.intel.com Signed-off-by: Tom Zanussi Signed-off-by: Steven Rostedt (VMware) --- kernel/trace/trace.c | 3 + kernel/trace/trace_events_hist.c | 266 +-- 2 files changed, 259 insertions(+), 10 deletions(-) diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c index 9f4d56f74b46..dd60c14a0fb0 100644 --- a/kernel/trace/trace.c +++ b/kernel/trace/trace.c @@ -4919,6 +4919,9 @@ static const char readme_msg[] = "\tThe available actions are:\n\n" "\t(param list)- generate synthetic event\n" "\tsave(field,...) - save current event fields\n" +#ifdef CONFIG_TRACER_SNAPSHOT + "\tsnapshot() - snapshot the trace buffer\n" +#endif #endif ; diff --git a/kernel/trace/trace_events_hist.c b/kernel/trace/trace_events_hist.c index 0515229e5f95..571937a268a3 100644 --- a/kernel/trace/trace_events_hist.c +++ b/kernel/trace/trace_events_hist.c @@ -396,6 +396,7 @@ enum handler_id { enum action_id { ACTION_SAVE = 1, ACTION_TRACE, + ACTION_SNAPSHOT, }; struct action_data { @@ -454,6 +455,83 @@ struct action_data { }; }; +struct track_data { + u64 track_val; + boolupdated; + + unsigned intkey_len; + void*key; + struct tracing_map_elt elt; + + struct action_data *action_data; + struct hist_trigger_data*hist_data; +}; + +struct hist_elt_data { + char *comm; + u64 *var_ref_vals; + char *field_var_str[SYNTH_FIELDS_MAX]; +}; + +struct snapshot_context { + struct tracing_map_elt *elt; + void*key; +}; + +static void track_data_free(struct track_data *track_data) +{ + struct hist_elt_data *elt_data; + + if (!track_data) + return; + + kfree(track_data->key); + + elt_data = track_data->elt.private_data; + if (elt_data) { + kfree(elt_data->comm); + kfree(elt_data); + } + + kfree(track_data); +} + +static struct track_data *track_data_alloc(unsigned int key_len, + struct action_data *action_data, + struct hist_trigger_data *hist_data) +{ + struct track_data *data = kzalloc(sizeof(*data), GFP_KERNEL); + struct hist_elt_data *elt_data; + + if (!data) + return ERR_PTR(-ENOMEM); + + data->key = kzalloc(key_len, GFP_KERNEL); + if (!data->key) { + track_data_free(data); + return ERR_PTR(-ENOMEM); + } + + data->key_len = key_len; + data->action_data = action_data; + data->hist_data = hist_data; + + elt_data = kzalloc(sizeof(*elt_data), GFP_KERNEL); + if (!elt_data) { + track_data_free(data); + return ERR_PTR(-ENOMEM); + } + data->elt.private_data = elt_data; + + elt_data->comm = kzalloc(TASK_COMM_LEN, GFP_KERNEL); + if (!elt_data->comm) { + track_data_free(data); + return ERR_PTR(-ENOMEM); + } + + return data; +} + static char last_hist_cmd[MAX_FILTER_STR_VAL]; static char hist_err_str[MAX_FILTER_STR_VAL]; @@ -1726,12 +1804,6 @@ static struct hist_field *find_event_var(struct hist_trigger_data *hist_data, return hist_field; } -struct hist_elt_data { - char *comm; - u64 *var_ref_vals; - char *field_var_str[SYNTH_FIELDS_MAX]; -}; - static u64 hist_field_var_ref(struct hist_field *hist_field, struct tracing_map_elt *elt, struct ring_buffer_event *rbe, @@ -3452,6 +3524,112 @@ static bool check_track_val(struct tracing_map_elt *elt, return data->track_data.check_val(track_val, var_val); } +#ifdef CONFIG_TRACER_SNAPSHOT +static bool cond_snapshot_update(struct trace_array *tr, void *cond_data) +{ + /* called with tr->max_lock held */ + struct track_data *track_data = tr->cond_snapshot->cond_data; + struct hist_elt_data *elt_data, *track_elt_data; + struct snapshot_context *context = cond_data; + u64 track_val; + + if (!track_data) + return
[for-next][PATCH 09/29] ring-buffer: Remove unused function ring_buffer_page_len()
From: Miroslav Benes Commit 6b7e633fe9c2 ("tracing: Remove extra zeroing out of the ring buffer page") removed the only caller of ring_buffer_page_len(). The function is now unused and may be removed. Link: http://lkml.kernel.org/r/20181228133847.106177-1-mbe...@suse.cz Signed-off-by: Miroslav Benes Signed-off-by: Steven Rostedt (VMware) --- include/linux/ring_buffer.h | 2 -- kernel/trace/ring_buffer.c | 14 -- 2 files changed, 16 deletions(-) diff --git a/include/linux/ring_buffer.h b/include/linux/ring_buffer.h index 5b9ae62272bb..f1429675f252 100644 --- a/include/linux/ring_buffer.h +++ b/include/linux/ring_buffer.h @@ -187,8 +187,6 @@ void ring_buffer_set_clock(struct ring_buffer *buffer, void ring_buffer_set_time_stamp_abs(struct ring_buffer *buffer, bool abs); bool ring_buffer_time_stamp_abs(struct ring_buffer *buffer); -size_t ring_buffer_page_len(void *page); - size_t ring_buffer_nr_pages(struct ring_buffer *buffer, int cpu); size_t ring_buffer_nr_dirty_pages(struct ring_buffer *buffer, int cpu); diff --git a/kernel/trace/ring_buffer.c b/kernel/trace/ring_buffer.c index 06e864a334bb..9a91479bbbfe 100644 --- a/kernel/trace/ring_buffer.c +++ b/kernel/trace/ring_buffer.c @@ -353,20 +353,6 @@ static void rb_init_page(struct buffer_data_page *bpage) local_set(>commit, 0); } -/** - * ring_buffer_page_len - the size of data on the page. - * @page: The page to read - * - * Returns the amount of data on the page, including buffer page header. - */ -size_t ring_buffer_page_len(void *page) -{ - struct buffer_data_page *bpage = page; - - return (local_read(>commit) & ~RB_MISSED_FLAGS) - + BUF_PAGE_HDR_SIZE; -} - /* * Also stolen from mm/slob.c. Thanks to Mathieu Desnoyers for pointing * this issue out. -- 2.20.1
[for-next][PATCH 26/29] tracing: Add hist trigger onchange() handler test case
From: Tom Zanussi Add a test case verifying the basic functionality of the hist:onchange($var) handler. Link: http://lkml.kernel.org/r/bec87aa8ed7d81794510b3d465096a750c71fce7.1550100284.git.tom.zanu...@linux.intel.com Signed-off-by: Tom Zanussi Acked-by: Masami Hiramatsu Signed-off-by: Steven Rostedt (VMware) --- .../trigger-onchange-action-hist.tc | 28 +++ 1 file changed, 28 insertions(+) create mode 100644 tools/testing/selftests/ftrace/test.d/trigger/inter-event/trigger-onchange-action-hist.tc diff --git a/tools/testing/selftests/ftrace/test.d/trigger/inter-event/trigger-onchange-action-hist.tc b/tools/testing/selftests/ftrace/test.d/trigger/inter-event/trigger-onchange-action-hist.tc new file mode 100644 index ..064a284e4e75 --- /dev/null +++ b/tools/testing/selftests/ftrace/test.d/trigger/inter-event/trigger-onchange-action-hist.tc @@ -0,0 +1,28 @@ +#!/bin/sh +# SPDX-License-Identifier: GPL-2.0 +# description: event trigger - test inter-event histogram trigger onchange action + +fail() { #msg +echo $1 +exit_fail +} + +if [ ! -f set_event ]; then +echo "event tracing is not supported" +exit_unsupported +fi + +grep -q "onchange(var)" README || exit_unsupported # version issue + +echo "Test onchange action" + +echo 'hist:keys=comm:newprio=prio:onchange($newprio).save(comm,prio) if comm=="ping"' >> /sys/kernel/debug/tracing/events/sched/sched_waking/trigger + +ping $LOCALHOST -c 3 +nice -n 1 ping $LOCALHOST -c 3 + +if ! grep -q "changed:" events/sched/sched_waking/hist; then +fail "Failed to create onchange action inter-event histogram" +fi + +exit 0 -- 2.20.1
[for-next][PATCH 14/29] tracing: Refactor hist trigger action code
From: Tom Zanussi The hist trigger action code currently implements two essentially hard-coded pairs of 'actions' - onmax(), which tracks a variable and saves some event fields when a max is hit, and onmatch(), which is hard-coded to generate a synthetic event. These hardcoded pairs (track max/save fields and detect match/generate synthetic event) should really be decoupled into separate components that can then be arbitrarily combined. The first component of each pair (track max/detect match) is called a 'handler' in the new code, while the second component (save fields/generate synthetic event) is called an 'action' in this scheme. This change refactors the action code to reflect this split by adding two handlers, HANDLER_ONMATCH and HANDLER_ONMAX, along with two actions, ACTION_SAVE and ACTION_TRACE. The new code combines them to produce the existing ONMATCH/TRACE and ONMAX/SAVE functionality, but doesn't implement the other combinations now possible. Future patches will expand these to further useful cases, such as ONMAX/TRACE, as well as add additional handlers and actions such as ONCHANGE and SNAPSHOT. Also, add abbreviated documentation for handlers and actions to README. Link: http://lkml.kernel.org/r/98bfdd48c1b4ff29fc5766442f99f5bc3c34b76b.1550100284.git.tom.zanu...@linux.intel.com Signed-off-by: Tom Zanussi Signed-off-by: Steven Rostedt (VMware) --- kernel/trace/trace_events_hist.c | 407 ++- 1 file changed, 238 insertions(+), 169 deletions(-) diff --git a/kernel/trace/trace_events_hist.c b/kernel/trace/trace_events_hist.c index 449d90cfa151..dfaaad582797 100644 --- a/kernel/trace/trace_events_hist.c +++ b/kernel/trace/trace_events_hist.c @@ -313,9 +313,9 @@ struct hist_trigger_data { struct field_var_hist *field_var_hists[SYNTH_FIELDS_MAX]; unsigned intn_field_var_hists; - struct field_var*max_vars[SYNTH_FIELDS_MAX]; - unsigned intn_max_vars; - unsigned intn_max_var_str; + struct field_var*save_vars[SYNTH_FIELDS_MAX]; + unsigned intn_save_vars; + unsigned intn_save_var_str; }; static int synth_event_create(int argc, const char **argv); @@ -383,11 +383,25 @@ struct action_data; typedef void (*action_fn_t) (struct hist_trigger_data *hist_data, struct tracing_map_elt *elt, void *rec, -struct ring_buffer_event *rbe, +struct ring_buffer_event *rbe, void *key, struct action_data *data, u64 *var_ref_vals); +enum handler_id { + HANDLER_ONMATCH = 1, + HANDLER_ONMAX, +}; + +enum action_id { + ACTION_SAVE = 1, + ACTION_TRACE, +}; + struct action_data { + enum handler_id handler; + enum action_id action; + char*action_name; action_fn_t fn; + unsigned intn_params; char*params[SYNTH_FIELDS_MAX]; @@ -404,13 +418,11 @@ struct action_data { unsigned intvar_ref_idx; char*match_event; char*match_event_system; - char*synth_event_name; struct synth_event *synth_event; } onmatch; struct { char*var_str; - char*fn_name; unsigned intmax_var_ref_idx; struct hist_field *max_var; struct hist_field *var; @@ -1078,7 +1090,7 @@ static struct synth_event *alloc_synth_event(const char *name, int n_fields, static void action_trace(struct hist_trigger_data *hist_data, struct tracing_map_elt *elt, void *rec, -struct ring_buffer_event *rbe, +struct ring_buffer_event *rbe, void *key, struct action_data *data, u64 *var_ref_vals) { struct synth_event *event = data->onmatch.synth_event; @@ -1644,7 +1656,7 @@ find_match_var(struct hist_trigger_data *hist_data, char *var_name) for (i = 0; i < hist_data->n_actions; i++) { struct action_data *data = hist_data->actions[i]; - if (data->fn == action_trace) { + if (data->handler == HANDLER_ONMATCH) { char *system = data->onmatch.match_event_system; char *event_name = data->onmatch.match_event; @@ -2076,7 +2088,7 @@ static int hist_trigger_elt_data_alloc(struct tracing_map_elt *elt) } } - n_str = hist_data->n_field_var_str +
[for-next][PATCH 16/29] tracing: Split up onmatch action data
From: Tom Zanussi Currently, the onmatch action data binds the onmatch action to data related to synthetic event generation. Since we want to allow the onmatch handler to potentially invoke a different action, and because we expect other handlers to generate synthetic events, we need to separate the data related to these two functions. Also rename the onmatch data to something more descriptive, and create and use common action data destroy function. Link: http://lkml.kernel.org/r/b9abbf9aae69fe3920cdc8ddbcaad544dd258d78.1550100284.git.tom.zanu...@linux.intel.com Signed-off-by: Tom Zanussi Signed-off-by: Steven Rostedt (VMware) --- kernel/trace/trace.c | 12 +++- kernel/trace/trace_events_hist.c | 103 --- 2 files changed, 63 insertions(+), 52 deletions(-) diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c index b583ff7656bb..b477926ac3bc 100644 --- a/kernel/trace/trace.c +++ b/kernel/trace/trace.c @@ -4700,6 +4700,7 @@ static const char readme_msg[] = "\t[:size=#entries]\n" "\t[:pause][:continue][:clear]\n" "\t[:name=histname1]\n" + "\t[:.]\n" "\t[if ]\n\n" "\tWhen a matching event is hit, an entry is added to a hash\n" "\ttable using the key(s) and value(s) named, and the value of a\n" @@ -4741,7 +4742,16 @@ static const char readme_msg[] = "\tThe enable_hist and disable_hist triggers can be used to\n" "\thave one event conditionally start and stop another event's\n" "\talready-attached hist trigger. The syntax is analagous to\n" - "\tthe enable_event and disable_event triggers.\n" + "\tthe enable_event and disable_event triggers.\n\n" + "\tHist trigger handlers and actions are executed whenever a\n" + "\ta histogram entry is added or updated. They take the form:\n\n" + "\t.\n\n" + "\tThe available handlers are:\n\n" + "\tonmatch(matching.event) - invoke on addition or update\n" + "\tonmax(var) - invoke if var exceeds current max\n\n" + "\tThe available actions are:\n\n" + "\t(param list)- generate synthetic event\n" + "\tsave(field,...) - save current event fields\n" #endif ; diff --git a/kernel/trace/trace_events_hist.c b/kernel/trace/trace_events_hist.c index dfaaad582797..0b843ecef547 100644 --- a/kernel/trace/trace_events_hist.c +++ b/kernel/trace/trace_events_hist.c @@ -405,21 +405,22 @@ struct action_data { unsigned intn_params; char*params[SYNTH_FIELDS_MAX]; + /* +* When a histogram trigger is hit, the values of any +* references to variables, including variables being passed +* as parameters to synthetic events, are collected into a +* var_ref_vals array. This var_ref_idx is the index of the +* first param in the array to be passed to the synthetic +* event invocation. +*/ + unsigned intvar_ref_idx; + struct synth_event *synth_event; + union { struct { - /* -* When a histogram trigger is hit, the values of any -* references to variables, including variables being passed -* as parameters to synthetic events, are collected into a -* var_ref_vals array. This var_ref_idx is the index of the -* first param in the array to be passed to the synthetic -* event invocation. -*/ - unsigned intvar_ref_idx; - char*match_event; - char*match_event_system; - struct synth_event *synth_event; - } onmatch; + char*event; + char*event_system; + } match_data; struct { char*var_str; @@ -1093,9 +1094,9 @@ static void action_trace(struct hist_trigger_data *hist_data, struct ring_buffer_event *rbe, void *key, struct action_data *data, u64 *var_ref_vals) { - struct synth_event *event = data->onmatch.synth_event; + struct synth_event *event = data->synth_event; - trace_synth(event, var_ref_vals, data->onmatch.var_ref_idx); + trace_synth(event, var_ref_vals, data->var_ref_idx); } struct hist_var_data { @@ -1657,8 +1658,8 @@ find_match_var(struct hist_trigger_data *hist_data, char *var_name) struct action_data *data = hist_data->actions[i]; if
[for-next][PATCH 17/29] tracing: Generalize hist trigger onmax and save action
From: Tom Zanussi The action refactor code allowed actions and handlers to be separated, but the existing onmax handler and save action code is still not flexible enough to handle arbitrary coupling. This change generalizes them and in the process makes additional handlers and actions easier to implement. The onmax action can be broken up and thought of as two separate components - a variable to be tracked (the parameter given to the onmax($var_to_track) function) and an invisible variable created to save the ongoing result of doing something with that variable, such as saving the max value of that variable so far seen. Separating it out like this and renaming it appropriately allows us to use the same code for similar tracking functions such as onchange($var_to_track), which would just track the last value seen rather than the max seen so far, which is useful in some situations. Additionally, because different handlers and actions may want to save and access data differently e.g. save and retrieve tracking values as local variables vs something more global, save_val() and get_val() interface functions are introduced and max-specific implementations are used instead. The same goes for the code that checks whether a maximum has been hit - a generic check_val() interface and max-checking implementation is used instead, which allows future patches to make use of he same code using their own implemetations of similar functionality. Link: http://lkml.kernel.org/r/980ea73dd8e3f36db3d646f99652f8fed42b77d4.1550100284.git.tom.zanu...@linux.intel.com Signed-off-by: Tom Zanussi Signed-off-by: Steven Rostedt (VMware) --- kernel/trace/trace_events_hist.c | 236 +-- 1 file changed, 160 insertions(+), 76 deletions(-) diff --git a/kernel/trace/trace_events_hist.c b/kernel/trace/trace_events_hist.c index 0b843ecef547..0515229e5f95 100644 --- a/kernel/trace/trace_events_hist.c +++ b/kernel/trace/trace_events_hist.c @@ -386,6 +386,8 @@ typedef void (*action_fn_t) (struct hist_trigger_data *hist_data, struct ring_buffer_event *rbe, void *key, struct action_data *data, u64 *var_ref_vals); +typedef bool (*check_track_val_fn_t) (u64 track_val, u64 var_val); + enum handler_id { HANDLER_ONMATCH = 1, HANDLER_ONMAX, @@ -423,15 +425,35 @@ struct action_data { } match_data; struct { + /* +* var_str contains the $-unstripped variable +* name referenced by var_ref, and used when +* printing the action. Because var_ref +* creation is deferred to create_actions(), +* we need a per-action way to save it until +* then, thus var_str. +*/ char*var_str; - unsigned intmax_var_ref_idx; - struct hist_field *max_var; - struct hist_field *var; - } onmax; + + /* +* var_ref refers to the variable being +* tracked e.g onmax($var). +*/ + struct hist_field *var_ref; + + /* +* track_var contains the 'invisible' tracking +* variable created to keep the current +* e.g. max value. +*/ + struct hist_field *track_var; + + check_track_val_fn_tcheck_val; + action_fn_t save_data; + } track_data; }; }; - static char last_hist_cmd[MAX_FILTER_STR_VAL]; static char hist_err_str[MAX_FILTER_STR_VAL]; @@ -3238,10 +3260,10 @@ static void update_field_vars(struct hist_trigger_data *hist_data, hist_data->n_field_vars, 0); } -static void update_max_vars(struct hist_trigger_data *hist_data, - struct tracing_map_elt *elt, - struct ring_buffer_event *rbe, - void *rec) +static void save_track_data_vars(struct hist_trigger_data *hist_data, +struct tracing_map_elt *elt, void *rec, +struct ring_buffer_event *rbe, void *key, +struct action_data *data, u64 *var_ref_vals) { __update_field_vars(elt, rbe, rec, hist_data->save_vars, hist_data->n_save_vars, hist_data->n_field_var_str); @@ -3379,14 +3401,67 @@ create_target_field_var(struct hist_trigger_data *target_hist_data, return create_field_var(target_hist_data, file, var_name); } -static void onmax_print(struct seq_file *m, -
[for-next][PATCH 27/29] tracing: Add alternative synthetic event trace action test case
From: Tom Zanussi Add a test case for the alternative trace(http://lkml.kernel.org/r/0616d18423ab1dfdbf333bce9c92ac4fa0779207.1550100284.git.tom.zanu...@linux.intel.com Acked-by: Masami Hiramatsu Signed-off-by: Tom Zanussi Signed-off-by: Steven Rostedt (VMware) --- .../inter-event/trigger-trace-action-hist.tc | 42 +++ 1 file changed, 42 insertions(+) create mode 100644 tools/testing/selftests/ftrace/test.d/trigger/inter-event/trigger-trace-action-hist.tc diff --git a/tools/testing/selftests/ftrace/test.d/trigger/inter-event/trigger-trace-action-hist.tc b/tools/testing/selftests/ftrace/test.d/trigger/inter-event/trigger-trace-action-hist.tc new file mode 100644 index ..8021d60aafec --- /dev/null +++ b/tools/testing/selftests/ftrace/test.d/trigger/inter-event/trigger-trace-action-hist.tc @@ -0,0 +1,42 @@ +#!/bin/sh +# SPDX-License-Identifier: GPL-2.0 +# description: event trigger - test inter-event histogram trigger trace action + +fail() { #msg +echo $1 +exit_fail +} + +if [ ! -f set_event ]; then +echo "event tracing is not supported" +exit_unsupported +fi + +if [ ! -f synthetic_events ]; then +echo "synthetic event is not supported" +exit_unsupported +fi + +grep -q "trace(" README || exit_unsupported # version issue + +echo "Test create synthetic event" + +echo 'wakeup_latency u64 lat pid_t pid char comm[16]' > synthetic_events +if [ ! -d events/synthetic/wakeup_latency ]; then +fail "Failed to create wakeup_latency synthetic event" +fi + +echo "Test create histogram for synthetic event using trace action" +echo "Test histogram variables,simple expression support and trace action" + +echo 'hist:keys=pid:ts0=common_timestamp.usecs if comm=="ping"' > events/sched/sched_wakeup/trigger +echo 'hist:keys=next_pid:wakeup_lat=common_timestamp.usecs-$ts0:onmatch(sched.sched_wakeup).trace(wakeup_latency,$wakeup_lat,next_pid,next_comm) if next_comm=="ping"' > events/sched/sched_switch/trigger +echo 'hist:keys=comm,pid,lat:wakeup_lat=lat:sort=lat' > events/synthetic/wakeup_latency/trigger + +ping $LOCALHOST -c 5 + +if ! grep -q "ping" events/synthetic/wakeup_latency/hist; then +fail "Failed to create trace action inter-event histogram" +fi + +exit 0 -- 2.20.1
[for-next][PATCH 23/29] tracing: Add alternative synthetic event trace action syntax
From: Tom Zanussi Add a 'trace(synthetic_event_name, params)' alternative to synthetic_event_name(params). Currently, the syntax used for generating synthetic events is to invoke synthetic_event_name(params) i.e. use the synthetic event name as a function call. Users requested a new form that more explicitly shows that the synthetic event is in effect being traced. In this version, a new 'trace()' keyword is used, and the synthetic event name is passed in as the first argument. In addition, for the sake of consistency with other actions, change the documention to emphasize the trace() form over the function-call form, which remains documented as equivalent. Link: http://lkml.kernel.org/r/d082773e50232a001480cf837679a1e01c1a2eb7.1550100284.git.tom.zanu...@linux.intel.com Signed-off-by: Tom Zanussi Signed-off-by: Steven Rostedt (VMware) --- Documentation/trace/histogram.rst | 54 +-- kernel/trace/trace.c | 2 +- kernel/trace/trace_events_hist.c | 42 +--- 3 files changed, 76 insertions(+), 22 deletions(-) diff --git a/Documentation/trace/histogram.rst b/Documentation/trace/histogram.rst index 79476c906b1a..0ea59d45aef1 100644 --- a/Documentation/trace/histogram.rst +++ b/Documentation/trace/histogram.rst @@ -1873,31 +1873,45 @@ The available handlers are: The available actions are: - - (param list) - generate synthetic event + - trace(,param list) - generate synthetic event - save(field,...)- save current event fields - snapshot() - snapshot the trace buffer The following commonly-used handler.action pairs are available: - - onmatch(matching.event).(param list) + - onmatch(matching.event).trace(,param list) -The 'onmatch(matching.event).(params)' hist -trigger action is invoked whenever an event matches and the -histogram entry would be added or updated. It causes the named -synthetic event to be generated with the values given in the +The 'onmatch(matching.event).trace(,param +list)' hist trigger action is invoked whenever an event matches +and the histogram entry would be added or updated. It causes the +named synthetic event to be generated with the values given in the 'param list'. The result is the generation of a synthetic event that consists of the values contained in those variables at the -time the invoking event was hit. - -The 'param list' consists of one or more parameters which may be -either variables or fields defined on either the 'matching.event' -or the target event. The variables or fields specified in the -param list may be either fully-qualified or unqualified. If a -variable is specified as unqualified, it must be unique between -the two events. A field name used as a param can be unqualified -if it refers to the target event, but must be fully qualified if -it refers to the matching event. A fully-qualified name is of the -form 'system.event_name.$var_name' or 'system.event_name.field'. +time the invoking event was hit. For example, if the synthetic +event name is 'wakeup_latency', a wakeup_latency event is +generated using onmatch(event).trace(wakeup_latency,arg1,arg2). + +There is also an equivalent alternative form available for +generating synthetic events. In this form, the synthetic event +name is used as if it were a function name. For example, using +the 'wakeup_latency' synthetic event name again, the +wakeup_latency event would be generated by invoking it as if it +were a function call, with the event field values passed in as +arguments: onmatch(event).wakeup_latency(arg1,arg2). The syntax +for this form is: + + onmatch(matching.event).(param list) + +In either case, the 'param list' consists of one or more +parameters which may be either variables or fields defined on +either the 'matching.event' or the target event. The variables or +fields specified in the param list may be either fully-qualified +or unqualified. If a variable is specified as unqualified, it +must be unique between the two events. A field name used as a +param can be unqualified if it refers to the target event, but +must be fully qualified if it refers to the matching event. A +fully-qualified name is of the form 'system.event_name.$var_name' +or 'system.event_name.field'. The 'matching.event' specification is simply the fully qualified event name of the event that matches the target event for the @@ -1928,6 +1942,12 @@ The following commonly-used handler.action pairs are available: wakeup_new_test($testpid) if comm=="cyclictest"' >> \ /sys/kernel/debug/tracing/events/sched/sched_wakeup_new/trigger +Or, equivalently, using the 'trace' keyword syntax: + +# echo
[for-next][PATCH 28/29] tracing: Add hist trigger action expected fail test case
From: Tom Zanussi Add a test case verifying that basic action combinations fail as expected. Link: http://lkml.kernel.org/r/1790bf93e01dbdfa1b4af945f42147d92bd565aa.1550100284.git.tom.zanu...@linux.intel.com Acked-by: Masami Hiramatsu Signed-off-by: Tom Zanussi Signed-off-by: Steven Rostedt (VMware) --- .../inter-event/trigger-action-hist-xfail.tc | 30 +++ 1 file changed, 30 insertions(+) create mode 100644 tools/testing/selftests/ftrace/test.d/trigger/inter-event/trigger-action-hist-xfail.tc diff --git a/tools/testing/selftests/ftrace/test.d/trigger/inter-event/trigger-action-hist-xfail.tc b/tools/testing/selftests/ftrace/test.d/trigger/inter-event/trigger-action-hist-xfail.tc new file mode 100644 index ..1221240f8cf6 --- /dev/null +++ b/tools/testing/selftests/ftrace/test.d/trigger/inter-event/trigger-action-hist-xfail.tc @@ -0,0 +1,30 @@ +#!/bin/sh +# SPDX-License-Identifier: GPL-2.0 +# description: event trigger - test inter-event histogram trigger expected fail actions + +fail() { #msg +echo $1 +exit_fail +} + +if [ ! -f set_event ]; then +echo "event tracing is not supported" +exit_unsupported +fi + +if [ ! -f snapshot ]; then +echo "snapshot is not supported" +exit_unsupported +fi + +grep -q "snapshot()" README || exit_unsupported # version issue + +echo "Test expected snapshot action failure" + +echo 'hist:keys=comm:onmatch(sched.sched_wakeup).snapshot()' >> /sys/kernel/debug/tracing/events/sched/sched_waking/trigger && exit_fail + +echo "Test expected save action failure" + +echo 'hist:keys=comm:onmatch(sched.sched_wakeup).save(comm,prio)' >> /sys/kernel/debug/tracing/events/sched/sched_waking/trigger && exit_fail + +exit_xfail -- 2.20.1
[for-next][PATCH 29/29] tracing: Comment why cond_snapshot is checked outside of max_lock protection
From: "Steven Rostedt (VMware)" Before setting tr->cond_snapshot, it must be NULL before it can be updated. It can go to NULL when a trace event hist trigger is created or removed, and can only be modified under the max_lock spin lock. But because it can only be set to something other than NULL under both the max_lock spin lock as well as the trace_types_lock, we can perform the check if it is not NULL only under the trace_types_lock and fail out without having to grab the max_lock spin lock. This is very subtle, and deserves a comment. Signed-off-by: Steven Rostedt (VMware) --- kernel/trace/trace.c | 8 1 file changed, 8 insertions(+) diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c index 0460cc0f28fd..2cf3c747a357 100644 --- a/kernel/trace/trace.c +++ b/kernel/trace/trace.c @@ -1116,6 +1116,14 @@ int tracing_snapshot_cond_enable(struct trace_array *tr, void *cond_data, goto fail_unlock; } + /* +* The cond_snapshot can only change to NULL without the +* trace_types_lock. We don't care if we race with it going +* to NULL, but we want to make sure that it's not set to +* something other than NULL when we get here, which we can +* do safely with only holding the trace_types_lock and not +* having to take the max_lock. +*/ if (tr->cond_snapshot) { ret = -EBUSY; goto fail_unlock; -- 2.20.1
[for-next][PATCH 24/29] tracing: Add SPDX license GPL-2.0 license identifier to inter-event testcases
From: Tom Zanussi Apparently this directory was missed in the license cleanup process - add the missing identifiers to the trigger/inter-event test cases. Link: http://lkml.kernel.org/r/6f9828c2cfb0b378ebd217a39a1b44f063fc17fb.1550100284.git.tom.zanu...@linux.intel.com Signed-off-by: Tom Zanussi Signed-off-by: Steven Rostedt (VMware) --- .../test.d/trigger/inter-event/trigger-extended-error-support.tc | 1 + .../test.d/trigger/inter-event/trigger-field-variable-support.tc | 1 + .../trigger/inter-event/trigger-inter-event-combined-hist.tc | 1 + .../test.d/trigger/inter-event/trigger-multi-actions-accept.tc | 1 + .../test.d/trigger/inter-event/trigger-onmatch-action-hist.tc| 1 + .../trigger/inter-event/trigger-onmatch-onmax-action-hist.tc | 1 + .../test.d/trigger/inter-event/trigger-onmax-action-hist.tc | 1 + .../trigger/inter-event/trigger-synthetic-event-createremove.tc | 1 + 8 files changed, 8 insertions(+) diff --git a/tools/testing/selftests/ftrace/test.d/trigger/inter-event/trigger-extended-error-support.tc b/tools/testing/selftests/ftrace/test.d/trigger/inter-event/trigger-extended-error-support.tc index 401104344593..9912616a8672 100644 --- a/tools/testing/selftests/ftrace/test.d/trigger/inter-event/trigger-extended-error-support.tc +++ b/tools/testing/selftests/ftrace/test.d/trigger/inter-event/trigger-extended-error-support.tc @@ -1,4 +1,5 @@ #!/bin/sh +# SPDX-License-Identifier: GPL-2.0 # description: event trigger - test extended error support diff --git a/tools/testing/selftests/ftrace/test.d/trigger/inter-event/trigger-field-variable-support.tc b/tools/testing/selftests/ftrace/test.d/trigger/inter-event/trigger-field-variable-support.tc index f59b2a9a1f22..77be6e1f6e7b 100644 --- a/tools/testing/selftests/ftrace/test.d/trigger/inter-event/trigger-field-variable-support.tc +++ b/tools/testing/selftests/ftrace/test.d/trigger/inter-event/trigger-field-variable-support.tc @@ -1,4 +1,5 @@ #!/bin/sh +# SPDX-License-Identifier: GPL-2.0 # description: event trigger - test field variable support fail() { #msg diff --git a/tools/testing/selftests/ftrace/test.d/trigger/inter-event/trigger-inter-event-combined-hist.tc b/tools/testing/selftests/ftrace/test.d/trigger/inter-event/trigger-inter-event-combined-hist.tc index 524d9ce361e2..f3eb8aacec0e 100644 --- a/tools/testing/selftests/ftrace/test.d/trigger/inter-event/trigger-inter-event-combined-hist.tc +++ b/tools/testing/selftests/ftrace/test.d/trigger/inter-event/trigger-inter-event-combined-hist.tc @@ -1,4 +1,5 @@ #!/bin/sh +# SPDX-License-Identifier: GPL-2.0 # description: event trigger - test inter-event combined histogram trigger fail() { #msg diff --git a/tools/testing/selftests/ftrace/test.d/trigger/inter-event/trigger-multi-actions-accept.tc b/tools/testing/selftests/ftrace/test.d/trigger/inter-event/trigger-multi-actions-accept.tc index 4ddc546771b5..d281f056f980 100644 --- a/tools/testing/selftests/ftrace/test.d/trigger/inter-event/trigger-multi-actions-accept.tc +++ b/tools/testing/selftests/ftrace/test.d/trigger/inter-event/trigger-multi-actions-accept.tc @@ -1,4 +1,5 @@ #!/bin/sh +# SPDX-License-Identifier: GPL-2.0 # description: event trigger - test multiple actions on hist trigger fail() { #msg diff --git a/tools/testing/selftests/ftrace/test.d/trigger/inter-event/trigger-onmatch-action-hist.tc b/tools/testing/selftests/ftrace/test.d/trigger/inter-event/trigger-onmatch-action-hist.tc index 39fb65b0cd9f..a708f0e7858a 100644 --- a/tools/testing/selftests/ftrace/test.d/trigger/inter-event/trigger-onmatch-action-hist.tc +++ b/tools/testing/selftests/ftrace/test.d/trigger/inter-event/trigger-onmatch-action-hist.tc @@ -1,4 +1,5 @@ #!/bin/sh +# SPDX-License-Identifier: GPL-2.0 # description: event trigger - test inter-event histogram trigger onmatch action fail() { #msg diff --git a/tools/testing/selftests/ftrace/test.d/trigger/inter-event/trigger-onmatch-onmax-action-hist.tc b/tools/testing/selftests/ftrace/test.d/trigger/inter-event/trigger-onmatch-onmax-action-hist.tc index 81ab3939c96a..dfce6932d8be 100644 --- a/tools/testing/selftests/ftrace/test.d/trigger/inter-event/trigger-onmatch-onmax-action-hist.tc +++ b/tools/testing/selftests/ftrace/test.d/trigger/inter-event/trigger-onmatch-onmax-action-hist.tc @@ -1,4 +1,5 @@ #!/bin/sh +# SPDX-License-Identifier: GPL-2.0 # description: event trigger - test inter-event histogram trigger onmatch-onmax action fail() { #msg diff --git a/tools/testing/selftests/ftrace/test.d/trigger/inter-event/trigger-onmax-action-hist.tc b/tools/testing/selftests/ftrace/test.d/trigger/inter-event/trigger-onmax-action-hist.tc index 1180ab5f0845..0035995c2194 100644 --- a/tools/testing/selftests/ftrace/test.d/trigger/inter-event/trigger-onmax-action-hist.tc +++ b/tools/testing/selftests/ftrace/test.d/trigger/inter-event/trigger-onmax-action-hist.tc @@ -1,4 +1,5 @@ #!/bin/sh +# SPDX-License-Identifier: GPL-2.0 #
[for-next][PATCH 25/29] tracing: Add hist trigger snapshot() action test case
From: Tom Zanussi Add a test case verifying the basic functionality of the hist:snapshot() action. Link: http://lkml.kernel.org/r/c0555f462cbfe56dadfec6e63e531e109bd72930.1550100284.git.tom.zanu...@linux.intel.com Signed-off-by: Tom Zanussi Acked-by: Masami Hiramatsu Signed-off-by: Steven Rostedt (VMware) --- .../trigger-snapshot-action-hist.tc | 43 +++ 1 file changed, 43 insertions(+) create mode 100644 tools/testing/selftests/ftrace/test.d/trigger/inter-event/trigger-snapshot-action-hist.tc diff --git a/tools/testing/selftests/ftrace/test.d/trigger/inter-event/trigger-snapshot-action-hist.tc b/tools/testing/selftests/ftrace/test.d/trigger/inter-event/trigger-snapshot-action-hist.tc new file mode 100644 index ..18fff69fc433 --- /dev/null +++ b/tools/testing/selftests/ftrace/test.d/trigger/inter-event/trigger-snapshot-action-hist.tc @@ -0,0 +1,43 @@ +#!/bin/sh +# SPDX-License-Identifier: GPL-2.0 +# description: event trigger - test inter-event histogram trigger snapshot action + +fail() { #msg +echo $1 +exit_fail +} + +if [ ! -f set_event ]; then +echo "event tracing is not supported" +exit_unsupported +fi + +if [ ! -f snapshot ]; then +echo "snapshot is not supported" +exit_unsupported +fi + +grep -q "onchange(var)" README || exit_unsupported # version issue + +grep -q "snapshot()" README || exit_unsupported # version issue + +echo "Test snapshot action" + +echo 1 > /sys/kernel/debug/tracing/events/sched/enable + +echo 'hist:keys=comm:newprio=prio:onchange($newprio).save(comm,prio):onchange($newprio).snapshot() if comm=="ping"' >> /sys/kernel/debug/tracing/events/sched/sched_waking/trigger + +ping $LOCALHOST -c 3 +nice -n 1 ping $LOCALHOST -c 3 + +echo 0 > tracing_on + +if ! grep -q "changed:" events/sched/sched_waking/hist; then +fail "Failed to create onchange action inter-event histogram" +fi + +if ! grep -q "comm=ping" snapshot; then +fail "Failed to create snapshot action inter-event histogram" +fi + +exit 0 -- 2.20.1
[for-next][PATCH 22/29] tracing: Add hist trigger onchange() handler Documentation
From: Tom Zanussi Add Documentation for the hist:onchange($var) handler. Link: http://lkml.kernel.org/r/ab54b7383b265609fda52648a8fbfbd2631a640f.1550100284.git.tom.zanu...@linux.intel.com Signed-off-by: Tom Zanussi Signed-off-by: Steven Rostedt (VMware) --- Documentation/trace/histogram.rst | 98 +++ 1 file changed, 98 insertions(+) diff --git a/Documentation/trace/histogram.rst b/Documentation/trace/histogram.rst index 353317bc3825..79476c906b1a 100644 --- a/Documentation/trace/histogram.rst +++ b/Documentation/trace/histogram.rst @@ -1869,6 +1869,7 @@ The available handlers are: - onmatch(matching.event)- invoke action on any addition or update - onmax(var) - invoke action if var exceeds current max + - onchange(var) - invoke action if var changes The available actions are: @@ -2140,6 +2141,103 @@ The following commonly-used handler.action pairs are available: <...>-2102 [005] d..4 309.875185: sched_wake_idle_without_ipi: cpu=1 -0 [001] d..3 309.875200: sched_switch: prev_comm=swapper/1 prev_pid=0 prev_prio=120 prev_state=S ==> next_comm=cyclictest next_pid=2103 next_prio=19 + - onchange(var).save(field,...) + +The 'onchange(var).save(field,...)' hist trigger action is invoked +whenever the value of 'var' associated with a histogram entry +changes. + +The end result is that the trace event fields specified as the +onchange.save() params will be saved if 'var' changes for that +hist trigger entry. This allows context from the event that +changed the value to be saved for later reference. When the +histogram is displayed, additional fields displaying the saved +values will be printed. + + - onchange(var).snapshot() + +The 'onchange(var).snapshot()' hist trigger action is invoked +whenever the value of 'var' associated with a histogram entry +changes. + +The end result is that a global snapshot of the trace buffer will +be saved in the tracing/snapshot file if 'var' changes for any +hist trigger entry. + +Note that in this case the changed value is a global variable +associated withe current trace instance. The key of the specific +trace event that caused the value to change and the global value +itself are displayed, along with a message stating that a snapshot +has been taken and where to find it. The user can use the key +information displayed to locate the corresponding bucket in the +histogram for even more detail. + +As an example the below defines a hist trigger on the tcp_probe +event, keyed on dport. Whenever a tcp_probe event occurs, the +cwnd field is checked against the current value stored in the +$cwnd variable. If the value has changed, a snapshot is taken. +As part of the setup, all the scheduler and tcp events are also +enabled, which are the events that will show up in the snapshot +when it is taken at some point: + +# echo 1 > /sys/kernel/debug/tracing/events/sched/enable +# echo 1 > /sys/kernel/debug/tracing/events/tcp/enable + +# echo 'hist:keys=dport:cwnd=snd_cwnd: \ +onchange($cwnd).save(snd_wnd,srtt,rcv_wnd): \ + onchange($cwnd).snapshot()' >> \ + /sys/kernel/debug/tracing/events/tcp/tcp_probe/trigger + +When the histogram is displayed, for each bucket the tracked value +and the saved values corresponding to that value are displayed +following the rest of the fields. + +If a snaphot was taken, there is also a message indicating that, +along with the value and event that triggered the snapshot: + +# cat /sys/kernel/debug/tracing/events/tcp/tcp_probe/hist + { dport: 1521 } hitcount: 8 + changed: 10 snd_wnd: 35456 srtt: 154262 rcv_wnd: 42112 + + { dport: 80 } hitcount: 23 + changed: 10 snd_wnd: 28960 srtt: 19604 rcv_wnd: 29312 + + { dport: 9001 } hitcount:172 + changed: 10 snd_wnd: 48384 srtt: 260444 rcv_wnd: 55168 + + { dport:443 } hitcount:211 + changed: 10 snd_wnd: 26960 srtt: 17379 rcv_wnd: 28800 + +Snapshot taken (see tracing/snapshot). Details: +triggering value { onchange($cwnd) }: 10 +triggered by event with key: { dport: 80 } + +Totals: +Hits: 414 +Entries: 4 +Dropped: 0 + +In the above case, the event that triggered the snapshot has the +key with dport == 80. If you look at the bucket that has 80 as +the key, you'll find the additional values save()'d along with the +changed value for that bucket, which should be the same as the +global changed value (since that was the same value that triggered +the global snapshot). + +And finally, looking at the snapshot data should
[for-next][PATCH 10/29] tracing: Change the function format to display function names by perf
From: Changbin Du Here is an example for this change. $ sudo perf record -e 'ftrace:function' --filter='ip==schedule' $ sudo perf report The output of perf before this patch: \# Samples: 100 of event 'ftrace:function' \# Event count (approx.): 100 \# \# Overhead Trace output \# .. \# 51.00% 81f6aaa0 <-- 81158e8d 29.00% 81f6aaa0 <-- 8116ccb2 8.00% 81f6aaa0 <-- 81f6f2ed 4.00% 81f6aaa0 <-- 811628db 4.00% 81f6aaa0 <-- 81f6ec5b 2.00% 81f6aaa0 <-- 81f6f21a 1.00% 81f6aaa0 <-- 811b04af 1.00% 81f6aaa0 <-- 8143ce17 After this patch: \# Samples: 36 of event 'ftrace:function' \# Event count (approx.): 36 \# \# Overhead Trace output \# \# 38.89% schedule <-- schedule_hrtimeout_range_clock 27.78% schedule <-- worker_thread 13.89% schedule <-- schedule_timeout 11.11% schedule <-- smpboot_thread_fn 5.56% schedule <-- rcu_gp_kthread 2.78% schedule <-- exit_to_usermode_loop Link: http://lkml.kernel.org/r/20190209161919.32350-1-changbin...@gmail.com Signed-off-by: Changbin Du Signed-off-by: Steven Rostedt (VMware) --- kernel/trace/trace_entries.h | 41 +--- 1 file changed, 19 insertions(+), 22 deletions(-) diff --git a/kernel/trace/trace_entries.h b/kernel/trace/trace_entries.h index 06bb2fd9a56c..fc8e97328e54 100644 --- a/kernel/trace/trace_entries.h +++ b/kernel/trace/trace_entries.h @@ -65,7 +65,8 @@ FTRACE_ENTRY_REG(function, ftrace_entry, __field(unsigned long, parent_ip ) ), - F_printk(" %lx <-- %lx", __entry->ip, __entry->parent_ip), + F_printk(" %ps <-- %ps", +(void *)__entry->ip, (void *)__entry->parent_ip), FILTER_TRACE_FN, @@ -83,7 +84,7 @@ FTRACE_ENTRY_PACKED(funcgraph_entry, ftrace_graph_ent_entry, __field_desc( int,graph_ent, depth ) ), - F_printk("--> %lx (%d)", __entry->func, __entry->depth), + F_printk("--> %ps (%d)", (void *)__entry->func, __entry->depth), FILTER_OTHER ); @@ -102,8 +103,8 @@ FTRACE_ENTRY_PACKED(funcgraph_exit, ftrace_graph_ret_entry, __field_desc( int,ret,depth ) ), - F_printk("<-- %lx (%d) (start: %llx end: %llx) over: %d", -__entry->func, __entry->depth, + F_printk("<-- %ps (%d) (start: %llx end: %llx) over: %d", +(void *)__entry->func, __entry->depth, __entry->calltime, __entry->rettime, __entry->depth), @@ -167,12 +168,6 @@ FTRACE_ENTRY_DUP(wakeup, ctx_switch_entry, #define FTRACE_STACK_ENTRIES 8 -#ifndef CONFIG_64BIT -# define IP_FMT "%08lx" -#else -# define IP_FMT "%016lx" -#endif - FTRACE_ENTRY(kernel_stack, stack_entry, TRACE_STACK, @@ -182,12 +177,13 @@ FTRACE_ENTRY(kernel_stack, stack_entry, __dynamic_array(unsigned long, caller ) ), - F_printk("\t=> (" IP_FMT ")\n\t=> (" IP_FMT ")\n\t=> (" IP_FMT ")\n" -"\t=> (" IP_FMT ")\n\t=> (" IP_FMT ")\n\t=> (" IP_FMT ")\n" -"\t=> (" IP_FMT ")\n\t=> (" IP_FMT ")\n", -__entry->caller[0], __entry->caller[1], __entry->caller[2], -__entry->caller[3], __entry->caller[4], __entry->caller[5], -__entry->caller[6], __entry->caller[7]), + F_printk("\t=> %ps\n\t=> %ps\n\t=> %ps\n" +"\t=> %ps\n\t=> %ps\n\t=> %ps\n" +"\t=> %ps\n\t=> %ps\n", +(void *)__entry->caller[0], (void *)__entry->caller[1], +(void *)__entry->caller[2], (void *)__entry->caller[3], +(void *)__entry->caller[4], (void *)__entry->caller[5], +(void *)__entry->caller[6], (void *)__entry->caller[7]), FILTER_OTHER ); @@ -201,12 +197,13 @@ FTRACE_ENTRY(user_stack, userstack_entry, __array(unsigned long, caller, FTRACE_STACK_ENTRIES ) ), - F_printk("\t=> (" IP_FMT ")\n\t=> (" IP_FMT ")\n\t=> (" IP_FMT ")\n" -"\t=> (" IP_FMT ")\n\t=> (" IP_FMT ")\n\t=> (" IP_FMT ")\n" -"\t=> (" IP_FMT ")\n\t=> (" IP_FMT ")\n", -__entry->caller[0], __entry->caller[1], __entry->caller[2], -__entry->caller[3], __entry->caller[4], __entry->caller[5], -__entry->caller[6], __entry->caller[7]), + F_printk("\t=> %ps\n\t=> %ps\n\t=> %ps\n" +"\t=> %ps\n\t=> %ps\n\t=> %ps\n" +"\t=> %ps\n\t=> %ps\n", +(void *)__entry->caller[0], (void *)__entry->caller[1], +(void *)__entry->caller[2], (void *)__entry->caller[3], +(void
[for-next][PATCH 03/29] tracing: Annotate implicit fall through in predicate_parse()
From: Mathieu Malaterre There is a plan to build the kernel with -Wimplicit-fallthrough and this place in the code produced a warning (W=1). This commit remove the following warning: kernel/trace/trace_events_filter.c:494:8: warning: this statement may fall through [-Wimplicit-fallthrough=] Link: http://lkml.kernel.org/r/20190114203039.16535-2-ma...@debian.org Signed-off-by: Mathieu Malaterre Signed-off-by: Steven Rostedt (VMware) --- kernel/trace/trace_events_filter.c | 1 + 1 file changed, 1 insertion(+) diff --git a/kernel/trace/trace_events_filter.c b/kernel/trace/trace_events_filter.c index 27821480105e..eb694756c4bb 100644 --- a/kernel/trace/trace_events_filter.c +++ b/kernel/trace/trace_events_filter.c @@ -495,6 +495,7 @@ predicate_parse(const char *str, int nr_parens, int nr_preds, ptr++; break; } + /* fall through */ default: parse_error(pe, FILT_ERR_TOO_MANY_PREDS, next - str); -- 2.20.1
[for-next][PATCH 11/29] ftrace: Allow enabling of filters via index of available_filter_functions
From: "Steven Rostedt (VMware)" Enabling of large number of functions by echoing in a large subset of the functions in available_filter_functions can take a very long time. The process requires testing all functions registered by the function tracer (which is in the 10s of thousands), and doing a kallsyms lookup to convert the ip address into a name, then comparing that name with the string passed in. When a function causes the function tracer to crash the system, a binary bisect of the available_filter_functions can be done to find the culprit. But this requires passing in half of the functions in available_filter_functions over and over again, which makes it basically a O(n^2) operation. With 40,000 functions, that ends up bing 1,600,000,000 opertions! And enabling this can take over 20 minutes. As a quick speed up, if a number is passed into one of the filter files, instead of doing a search, it just enables the function at the corresponding line of the available_filter_functions file. That is: # echo 50 > set_ftrace_filter # cat set_ftrace_filter x86_pmu_commit_txn # head -50 available_filter_functions | tail -1 x86_pmu_commit_txn This allows setting of half the available_filter_functions to take place in less than a second! # time seq 2 > set_ftrace_filter real0m0.042s user0m0.005s sys 0m0.015s # wc -l set_ftrace_filter 2 set_ftrace_filter Signed-off-by: Steven Rostedt (VMware) --- Documentation/trace/ftrace.rst | 38 ++ kernel/trace/ftrace.c | 30 +++ kernel/trace/trace.h | 1 + kernel/trace/trace_events_filter.c | 5 4 files changed, 74 insertions(+) diff --git a/Documentation/trace/ftrace.rst b/Documentation/trace/ftrace.rst index 6ce2763a2a3e..7c5e6d6ab5d1 100644 --- a/Documentation/trace/ftrace.rst +++ b/Documentation/trace/ftrace.rst @@ -233,6 +233,12 @@ of ftrace. Here is a list of some of the key files: This interface also allows for commands to be used. See the "Filter commands" section for more details. + As a speed up, since processing strings can't be quite expensive + and requires a check of all functions registered to tracing, instead + an index can be written into this file. A number (starting with "1") + written will instead select the same corresponding at the line position + of the "available_filter_functions" file. + set_ftrace_notrace: This has an effect opposite to that of @@ -2835,6 +2841,38 @@ Produces:: We can see that there's no more lock or preempt tracing. +Selecting function filters via index + + +Because processing of strings is expensive (the address of the function +needs to be looked up before comparing to the string being passed in), +an index can be used as well to enable functions. This is useful in the +case of setting thousands of specific functions at a time. By passing +in a list of numbers, no string processing will occur. Instead, the function +at the specific location in the internal array (which corresponds to the +functions in the "available_filter_functions" file), is selected. + +:: + + # echo 1 > set_ftrace_filter + +Will select the first function listed in "available_filter_functions" + +:: + + # head -1 available_filter_functions + trace_initcall_finish_cb + + # cat set_ftrace_filter + trace_initcall_finish_cb + + # head -50 available_filter_functions | tail -1 + x86_pmu_commit_txn + + # echo 1 50 > set_ftrace_filter + # cat set_ftrace_filter + trace_initcall_finish_cb + x86_pmu_commit_txn Dynamic ftrace with the function graph tracer - diff --git a/kernel/trace/ftrace.c b/kernel/trace/ftrace.c index aac7847c0214..fa79323331b2 100644 --- a/kernel/trace/ftrace.c +++ b/kernel/trace/ftrace.c @@ -3701,6 +3701,31 @@ enter_record(struct ftrace_hash *hash, struct dyn_ftrace *rec, int clear_filter) return ret; } +static int +add_rec_by_index(struct ftrace_hash *hash, struct ftrace_glob *func_g, +int clear_filter) +{ + long index = simple_strtoul(func_g->search, NULL, 0); + struct ftrace_page *pg; + struct dyn_ftrace *rec; + + /* The index starts at 1 */ + if (--index < 0) + return 0; + + do_for_each_ftrace_rec(pg, rec) { + if (pg->index <= index) { + index -= pg->index; + /* this is a double loop, break goes to the next page */ + break; + } + rec = >records[index]; + enter_record(hash, rec, clear_filter); + return 1; + } while_for_each_ftrace_rec(); + return 0; +} + static int ftrace_match_record(struct dyn_ftrace *rec, struct ftrace_glob *func_g, struct ftrace_glob *mod_g, int exclude_mod) @@ -3769,6 +3794,11 @@ match_records(struct
Re: [RFC PATCH net-next v3 13/21] ethtool: provide timestamping information in GET_INFO request
On Wed, 20 Feb 2019 14:00:07 +0100, Michal Kubecek wrote: > On Tue, Feb 19, 2019 at 07:00:48PM -0800, Jakub Kicinski wrote: > > On Mon, 18 Feb 2019 19:22:29 +0100 (CET), Michal Kubecek wrote: > > > Add timestamping information as provided by ETHTOOL_GET_TS_INFO ioctl > > > command in GET_INFO reply if ETH_INFO_IM_TSINFO flag is set in the > > > request. > > > > > > Add constants for counts of HWTSTAMP_TX_* and HWTSTAM_FILTER_* constants > > > and provide symbolic names for timestamping related values so that they > > > can > > > be retrieved in GET_STRSET and GET_INFO requests. > > > > What's the reason for providing the symbolic names? > > One of the the goals I had was to reduce the need to keep the lists of > possible values in sync between kernel and userspace ethtool and other > users of the interface so that when a new value is added, we don't have > to update all userspace tools to be able to use or present it. > > This already works in ethtool for some newer commands (e.g. features) > and obviously for those where the list of available options depends on > the device (e.g. private flags or statistics). I would like to extend > the principle also to older commands and new ones which do not work like > this (e.g. device reset). Let me try to argue that's the wrong direction. People should learn to update their user space tooling if they want access to new features. In my (limited) experience trying to solve forward compatibility leads to short term gains, and long term warts in the APIs and increased maintenance burden in the kernel.
Re: [PATCH] iio: cros_ec_accel_legacy: Mark expected switch fall-throughs
On Wed, 20 Feb 2019 10:20:39 -0800 Kees Cook wrote: > On Mon, Oct 8, 2018 at 10:24 AM Gustavo A. R. Silva > wrote: > > > > In preparation to enabling -Wimplicit-fallthrough, mark switch cases > > where we are expecting to fall through. > > > > Addresses-Coverity-ID: 1397962 ("Missing break in switch") > > Signed-off-by: Gustavo A. R. Silva > > --- > > drivers/iio/accel/cros_ec_accel_legacy.c | 2 ++ > > 1 file changed, 2 insertions(+) > > > > diff --git a/drivers/iio/accel/cros_ec_accel_legacy.c > > b/drivers/iio/accel/cros_ec_accel_legacy.c > > index 063e89e..d609654 100644 > > --- a/drivers/iio/accel/cros_ec_accel_legacy.c > > +++ b/drivers/iio/accel/cros_ec_accel_legacy.c > > @@ -385,8 +385,10 @@ static int cros_ec_accel_legacy_probe(struct > > platform_device *pdev) > > switch (i) { > > case X: > > ec_accel_channels[X].scan_index = Y; > > + /* fall through */ > > case Y: > > ec_accel_channels[Y].scan_index = X; > > + /* fall through */ > > case Z: > > ec_accel_channels[Z].scan_index = Z; > > } > > Shouldn't these actually be "break;"s ? It seems like the loop is > stepping through X, Y, and Z. The _result_ is accidentally the same: > > X: set X, Y, and Z > Y: set Y and Z > Z: set Z > > result: X, Y, and Z are set correctly. But the code is technically wrong. > Agreed, it's 'novel'. Waiting for Gwendal or someone else to come back and check it wasn't meant to be doing something else. Jonathan >
Re: [PATCH v15 15/15] tracing: Add hist trigger action 'expected fail' test case
On Wed, 20 Feb 2019 12:10:31 -0600 Tom Zanussi wrote: > > As far as I understand it (there's no other case of an xfail test in > the testsuite, so nothing similar to compare it to), the test output is > correct - here we get the expected fail, XFAIL, and not a FAIL as any > test, xfail or normal, that failed would produce: Yeah, I've been staring at the code, and commit: 915de2adb584a ftracetest: Add POSIX.3 standard and XFAIL result codes > > tools/testing/selftests/ftrace# ./ftracetest test.d/trigger/ > === Ftrace unit tests === > [1] event trigger - test inter-event histogram trigger expected fail actions > [XFAIL] > [2] event trigger - test extended error support > [PASS] > > And here the summary shows none failed, while we did have one expected > xfail, but that's what was expected, and not a failure: > > # of passed: 31 > # of failed: 0 > # of unresolved: 0 > # of untested: 0 > # of unsupported: 0 > # of xfailed: 1 Yeah, but it's marked as RED, which is why I thought it was a failure. > # of undefined(test bug): 0 > > If that's not correct, I'll fix it but at this point I'm not sure what > the output should be if not that. OK, so this has nothing to do with your patch set. I've tested everything else, and I'm ready to finally push my tree to linux-next. I'm thinking that we should get rid of xfail, as it's really confusing, and I don't understand its purpose. But that shouldn't stop pushing your patches. Thanks, -- Steve
Re: [RFC 0/5] RCU fixes for rcu_assign_pointer usage
On Wed, Feb 20, 2019 at 01:09:52PM -0500, Joel Fernandes wrote: > On Wed, Feb 20, 2019 at 08:42:43AM -0800, Paul E. McKenney wrote: > > On Tue, Feb 19, 2019 at 08:11:36PM -0800, Joel Fernandes wrote: > > > On Tue, Feb 19, 2019 at 8:08 PM Joel Fernandes (Google) > > > wrote: > > > > > > > > These patches fix various RCU API usage issues found due to sparse > > > > errors as a > > > > result of the recent check to add rcu_check_sparse() to > > > > rcu_assign_pointer(). > > > > > > > > This is very early RFC stage, and is only build tested. I am also only > > > > sending > > > > to the RCU group for initial review before sending to LKML. Thanks for > > > > any feedback! > > > > > > > > There are still more usages that cause errors such as rbtree which I am > > > > looking into. > > > > > > Looks like it got sent to LKML anyway, ;-) That's Ok since it is > > > prefixed as RFC. > > > > As is only right and proper. ;-) > > > > I don't see an immediate problem with them, but it would be good to get > > the relevant developers and maintainers on CC for the next version. I > > cannot claim to know that code very well. > > Definitely will CC them next time, sorry about that. I'll stop being so shy > but I have some scars that are still healing ;-) If you don't get at least a few scars, you aren't trying hard enough! ;-) Thanx, Paul
Re: [PATCHv6 00/10] Heterogenous memory node attributes
On Thu, Feb 14, 2019 at 10:10:07AM -0700, Keith Busch wrote: > Platforms may provide multiple types of cpu attached system memory. The > memory ranges for each type may have different characteristics that > applications may wish to know about when considering what node they want > their memory allocated from. > > It had previously been difficult to describe these setups as memory > rangers were generally lumped into the NUMA node of the CPUs. New > platform attributes have been created and in use today that describe > the more complex memory hierarchies that can be created. > > This series' objective is to provide the attributes from such systems > that are useful for applications to know about, and readily usable with > existing tools and libraries. Those applications may query performance > attributes relative to a particular CPU they're running on in order to > make more informed choices for where they want to allocate hot and cold > data. This works with mbind() or the numactl library. Hi all, So this seems very calm at this point. Unless there are any late concerns or suggestions, could we open consideration for queueing in a staging tree for a future merge window? Thanks, Keith > Keith Busch (10): > acpi: Create subtable parsing infrastructure > acpi: Add HMAT to generic parsing tables > acpi/hmat: Parse and report heterogeneous memory > node: Link memory nodes to their compute nodes > node: Add heterogenous memory access attributes > node: Add memory-side caching attributes > acpi/hmat: Register processor domain to its memory > acpi/hmat: Register performance attributes > acpi/hmat: Register memory side cache attributes > doc/mm: New documentation for memory performance > > Documentation/ABI/stable/sysfs-devices-node | 89 +++- > Documentation/admin-guide/mm/numaperf.rst | 164 +++ > arch/arm64/kernel/acpi_numa.c | 2 +- > arch/arm64/kernel/smp.c | 4 +- > arch/ia64/kernel/acpi.c | 12 +- > arch/x86/kernel/acpi/boot.c | 36 +- > drivers/acpi/Kconfig | 1 + > drivers/acpi/Makefile | 1 + > drivers/acpi/hmat/Kconfig | 9 + > drivers/acpi/hmat/Makefile| 1 + > drivers/acpi/hmat/hmat.c | 677 > ++ > drivers/acpi/numa.c | 16 +- > drivers/acpi/scan.c | 4 +- > drivers/acpi/tables.c | 76 ++- > drivers/base/Kconfig | 8 + > drivers/base/node.c | 351 - > drivers/irqchip/irq-gic-v2m.c | 2 +- > drivers/irqchip/irq-gic-v3-its-pci-msi.c | 2 +- > drivers/irqchip/irq-gic-v3-its-platform-msi.c | 2 +- > drivers/irqchip/irq-gic-v3-its.c | 6 +- > drivers/irqchip/irq-gic-v3.c | 10 +- > drivers/irqchip/irq-gic.c | 4 +- > drivers/mailbox/pcc.c | 2 +- > include/linux/acpi.h | 6 +- > include/linux/node.h | 60 ++- > 25 files changed, 1480 insertions(+), 65 deletions(-) > create mode 100644 Documentation/admin-guide/mm/numaperf.rst > create mode 100644 drivers/acpi/hmat/Kconfig > create mode 100644 drivers/acpi/hmat/Makefile > create mode 100644 drivers/acpi/hmat/hmat.c
Re: BUG: assuming atomic context at kernel/seccomp.c:LINE
On Wed, Feb 20, 2019 at 2:00 AM Daniel Borkmann wrote: > > On 02/20/2019 10:32 AM, syzbot wrote: > > Hello, > > > > syzbot found the following crash on: > > > > HEAD commit:abf446c90405 Add linux-next specific files for 20190220 > > git tree: linux-next > > console output: https://syzkaller.appspot.com/x/log.txt?x=17f250d8c0 > > kernel config: https://syzkaller.appspot.com/x/.config?x=463cb576ac40e350 > > dashboard link: https://syzkaller.appspot.com/bug?extid=8bf19ee2aa580de7a2a7 > > compiler: gcc (GCC) 9.0.0 20181231 (experimental) > > > > Unfortunately, I don't have any reproducer for this crash yet. > > > > IMPORTANT: if you fix the bug, please add the following tag to the commit: > > Reported-by: syzbot+8bf19ee2aa580de7a...@syzkaller.appspotmail.com > > > > BUG: assuming atomic context at kernel/seccomp.c:271 > > in_atomic(): 0, irqs_disabled(): 0, pid: 12803, name: syz-executor.5 > > no locks held by syz-executor.5/12803. > > CPU: 1 PID: 12803 Comm: syz-executor.5 Not tainted 5.0.0-rc7-next-20190220 > > #39 > > Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS > > Google 01/01/2011 > > Call Trace: > > __dump_stack lib/dump_stack.c:77 [inline] > > dump_stack+0x172/0x1f0 lib/dump_stack.c:113 > > __cant_sleep kernel/sched/core.c:6218 [inline] > > __cant_sleep.cold+0xa3/0xbb kernel/sched/core.c:6195 > > seccomp_run_filters kernel/seccomp.c:271 [inline] > > __seccomp_filter+0x12b/0x12b0 kernel/seccomp.c:801 > > __secure_computing+0x101/0x360 kernel/seccomp.c:932 > > syscall_trace_enter+0x5bf/0xe10 arch/x86/entry/common.c:120 > > do_syscall_64+0x479/0x610 arch/x86/entry/common.c:280 > > entry_SYSCALL_64_after_hwframe+0x49/0xbe > > False positive; bpf-next only. Pushing this out in a bit: > > From d56547070162a105ff666f3324e558fa6492aedd Mon Sep 17 00:00:00 2001 > From: Daniel Borkmann > Date: Wed, 20 Feb 2019 10:51:17 +0100 > Subject: [PATCH bpf-next] bpf, seccomp: fix false positive preemption splat > for > cbpf->ebpf progs > > In 568f196756ad ("bpf: check that BPF programs run with preemption disabled") > a check was added for BPF_PROG_RUN() that for every invocation preemption is > disabled to not break eBPF assumptions (e.g. per-cpu map). Of course this does > not count for seccomp because only cBPF -> eBPF is loaded here and it does not > make use of any functionality that would require this assertion. Fix this > false > positive by adding and using __BPF_PROG_RUN() variant that does not have the > cant_sleep(); check. > > Fixes: 568f196756ad ("bpf: check that BPF programs run with preemption > disabled") > Reported-by: syzbot+8bf19ee2aa580de7a...@syzkaller.appspotmail.com > Signed-off-by: Daniel Borkmann Acked-by: Kees Cook -Kees > --- > include/linux/filter.h | 9 - > kernel/seccomp.c | 2 +- > 2 files changed, 9 insertions(+), 2 deletions(-) > > diff --git a/include/linux/filter.h b/include/linux/filter.h > index f32b3ec..2f3e29a 100644 > --- a/include/linux/filter.h > +++ b/include/linux/filter.h > @@ -533,7 +533,14 @@ struct sk_filter { > struct bpf_prog *prog; > }; > > -#define BPF_PROG_RUN(filter, ctx) ({ cant_sleep(); > (*(filter)->bpf_func)(ctx, (filter)->insnsi); }) > +#define bpf_prog_run__non_preempt(prog, ctx) \ > + ({ cant_sleep(); __BPF_PROG_RUN(prog, ctx); }) > +/* Native eBPF or cBPF -> eBPF transitions. Preemption must be disabled. */ > +#define BPF_PROG_RUN(prog, ctx)\ > + bpf_prog_run__non_preempt(prog, ctx) > +/* cBPF -> eBPF only, but not for native eBPF. */ > +#define __BPF_PROG_RUN(prog, ctx) \ > + (*(prog)->bpf_func)(ctx, (prog)->insnsi) > > #define BPF_SKB_CB_LEN QDISC_CB_PRIV_LEN > > diff --git a/kernel/seccomp.c b/kernel/seccomp.c > index e815781..826d4e4 100644 > --- a/kernel/seccomp.c > +++ b/kernel/seccomp.c > @@ -268,7 +268,7 @@ static u32 seccomp_run_filters(const struct seccomp_data > *sd, > * value always takes priority (ignoring the DATA). > */ > for (; f; f = f->prev) { > - u32 cur_ret = BPF_PROG_RUN(f->prog, sd); > + u32 cur_ret = __BPF_PROG_RUN(f->prog, sd); > > if (ACTION_ONLY(cur_ret) < ACTION_ONLY(ret)) { > ret = cur_ret; > -- > 2.9.5 -- Kees Cook
Re: [PATCH] iio: cros_ec_accel_legacy: Mark expected switch fall-throughs
On Mon, Oct 8, 2018 at 10:24 AM Gustavo A. R. Silva wrote: > > In preparation to enabling -Wimplicit-fallthrough, mark switch cases > where we are expecting to fall through. > > Addresses-Coverity-ID: 1397962 ("Missing break in switch") > Signed-off-by: Gustavo A. R. Silva > --- > drivers/iio/accel/cros_ec_accel_legacy.c | 2 ++ > 1 file changed, 2 insertions(+) > > diff --git a/drivers/iio/accel/cros_ec_accel_legacy.c > b/drivers/iio/accel/cros_ec_accel_legacy.c > index 063e89e..d609654 100644 > --- a/drivers/iio/accel/cros_ec_accel_legacy.c > +++ b/drivers/iio/accel/cros_ec_accel_legacy.c > @@ -385,8 +385,10 @@ static int cros_ec_accel_legacy_probe(struct > platform_device *pdev) > switch (i) { > case X: > ec_accel_channels[X].scan_index = Y; > + /* fall through */ > case Y: > ec_accel_channels[Y].scan_index = X; > + /* fall through */ > case Z: > ec_accel_channels[Z].scan_index = Z; > } Shouldn't these actually be "break;"s ? It seems like the loop is stepping through X, Y, and Z. The _result_ is accidentally the same: X: set X, Y, and Z Y: set Y and Z Z: set Z result: X, Y, and Z are set correctly. But the code is technically wrong. -- Kees Cook
[PATCH v2 1/2] soc: bcm: bcm2835-pm: Fix PM_IMAGE_PERI power domain support.
We don't have ASB master/slave regs for this domain, so just skip that step. Signed-off-by: Eric Anholt Fixes: 670c672608a1 ("soc: bcm: bcm2835-pm: Add support for power domains under a new binding.") --- drivers/soc/bcm/bcm2835-power.c | 14 -- 1 file changed, 12 insertions(+), 2 deletions(-) diff --git a/drivers/soc/bcm/bcm2835-power.c b/drivers/soc/bcm/bcm2835-power.c index 48412957ec7a..4a1b99b773c0 100644 --- a/drivers/soc/bcm/bcm2835-power.c +++ b/drivers/soc/bcm/bcm2835-power.c @@ -150,7 +150,12 @@ struct bcm2835_power { static int bcm2835_asb_enable(struct bcm2835_power *power, u32 reg) { - u64 start = ktime_get_ns(); + u64 start; + + if (!reg) + return 0; + + start = ktime_get_ns(); /* Enable the module's async AXI bridges. */ ASB_WRITE(reg, ASB_READ(reg) & ~ASB_REQ_STOP); @@ -165,7 +170,12 @@ static int bcm2835_asb_enable(struct bcm2835_power *power, u32 reg) static int bcm2835_asb_disable(struct bcm2835_power *power, u32 reg) { - u64 start = ktime_get_ns(); + u64 start; + + if (!reg) + return 0; + + start = ktime_get_ns(); /* Enable the module's async AXI bridges. */ ASB_WRITE(reg, ASB_READ(reg) | ASB_REQ_STOP); -- 2.20.1
Re: [PATCH 2/2] soc: bcm: bcm2835-pm: Fix error paths of initialization.
Florian Fainelli writes: > On 2/13/19 2:33 PM, Stefan Wahren wrote: >> >>> Eric Anholt hat am 13. Februar 2019 um 19:28 geschrieben: >>> >>> >>> Stefan Wahren writes: >>> Hi Eric, Am 13.02.19 um 01:33 schrieb Eric Anholt: > The clock driver may probe after ours and so we need to pass the > -EPROBE_DEFER out. Fix the other error path while we're here. > > Signed-off-by: Eric Anholt > Fixes: 670c672608a1 ("soc: bcm: bcm2835-pm: Add support for power domains > under a new binding.") > --- > drivers/soc/bcm/bcm2835-power.c | 30 +- > 1 file changed, 25 insertions(+), 5 deletions(-) > > diff --git a/drivers/soc/bcm/bcm2835-power.c > b/drivers/soc/bcm/bcm2835-power.c > index 4a1b99b773c0..11f9469423f7 100644 > --- a/drivers/soc/bcm/bcm2835-power.c > +++ b/drivers/soc/bcm/bcm2835-power.c > @@ -485,7 +485,7 @@ static int bcm2835_power_pd_power_off(struct > generic_pm_domain *domain) > } > } > > -static void > +static int > bcm2835_init_power_domain(struct bcm2835_power *power, > int pd_xlate_index, const char *name) > { > @@ -493,6 +493,12 @@ bcm2835_init_power_domain(struct bcm2835_power > *power, > struct bcm2835_power_domain *dom = >domains[pd_xlate_index]; > > dom->clk = devm_clk_get(dev->parent, name); > + if (IS_ERR(dom->clk)) { > + int ret = PTR_ERR(dom->clk); > + > + if (ret == -EPROBE_DEFER) > + return ret; is it safe to proceed in the other error cases? Even it would be more consistent with clk_prepare_enable() to print an error here. >>> >>> Yes, not all domains have a clk, so we want to ignore the other error. >> >> But shouldn't we set dom->clk to NULL instead of keeping the error >> pointer? AFAIK clk_prepare_enable is aware of NULL instead of error >> pointer. > > If the clock is really optional, then yes, this should be the way to go. Sigh, error pointers. Fixed, sending a v2. signature.asc Description: PGP signature
[PATCH v2 2/2] soc: bcm: bcm2835-pm: Fix error paths of initialization.
The clock driver may probe after ours and so we need to pass the -EPROBE_DEFER out. Fix the other error path while we're here. v2: Use dom->name instead of dom->gov as the flag for initialized domains, since we aren't setting up a governor. Make sure to clear ->clk when no clk is present in the DT. Signed-off-by: Eric Anholt Fixes: 670c672608a1 ("soc: bcm: bcm2835-pm: Add support for power domains under a new binding.") --- drivers/soc/bcm/bcm2835-power.c | 35 - 1 file changed, 30 insertions(+), 5 deletions(-) diff --git a/drivers/soc/bcm/bcm2835-power.c b/drivers/soc/bcm/bcm2835-power.c index 4a1b99b773c0..241c4ed80899 100644 --- a/drivers/soc/bcm/bcm2835-power.c +++ b/drivers/soc/bcm/bcm2835-power.c @@ -485,7 +485,7 @@ static int bcm2835_power_pd_power_off(struct generic_pm_domain *domain) } } -static void +static int bcm2835_init_power_domain(struct bcm2835_power *power, int pd_xlate_index, const char *name) { @@ -493,6 +493,17 @@ bcm2835_init_power_domain(struct bcm2835_power *power, struct bcm2835_power_domain *dom = >domains[pd_xlate_index]; dom->clk = devm_clk_get(dev->parent, name); + if (IS_ERR(dom->clk)) { + int ret = PTR_ERR(dom->clk); + + if (ret == -EPROBE_DEFER) + return ret; + + /* Some domains don't have a clk, so make sure that we +* don't deref an error pointer later. +*/ + dom->clk = NULL; + } dom->base.name = name; dom->base.power_on = bcm2835_power_pd_power_on; @@ -505,6 +516,8 @@ bcm2835_init_power_domain(struct bcm2835_power *power, pm_genpd_init(>base, NULL, true); power->pd_xlate.domains[pd_xlate_index] = >base; + + return 0; } /** bcm2835_reset_reset - Resets a block that has a reset line in the @@ -602,7 +615,7 @@ static int bcm2835_power_probe(struct platform_device *pdev) { BCM2835_POWER_DOMAIN_IMAGE_PERI, BCM2835_POWER_DOMAIN_CAM0 }, { BCM2835_POWER_DOMAIN_IMAGE_PERI, BCM2835_POWER_DOMAIN_CAM1 }, }; - int ret, i; + int ret = 0, i; u32 id; power = devm_kzalloc(dev, sizeof(*power), GFP_KERNEL); @@ -629,8 +642,11 @@ static int bcm2835_power_probe(struct platform_device *pdev) power->pd_xlate.num_domains = ARRAY_SIZE(power_domain_names); - for (i = 0; i < ARRAY_SIZE(power_domain_names); i++) - bcm2835_init_power_domain(power, i, power_domain_names[i]); + for (i = 0; i < ARRAY_SIZE(power_domain_names); i++) { + ret = bcm2835_init_power_domain(power, i, power_domain_names[i]); + if (ret) + goto fail; + } for (i = 0; i < ARRAY_SIZE(domain_deps); i++) { pm_genpd_add_subdomain(>domains[domain_deps[i].parent].base, @@ -644,12 +660,21 @@ static int bcm2835_power_probe(struct platform_device *pdev) ret = devm_reset_controller_register(dev, >reset); if (ret) - return ret; + goto fail; of_genpd_add_provider_onecell(dev->parent->of_node, >pd_xlate); dev_info(dev, "Broadcom BCM2835 power domains driver"); return 0; + +fail: + for (i = 0; i < ARRAY_SIZE(power_domain_names); i++) { + struct generic_pm_domain *dom = >domains[i].base; + + if (dom->name) + pm_genpd_remove(dom); + } + return ret; } static int bcm2835_power_remove(struct platform_device *pdev) -- 2.20.1
Re: [PATCH] intel_th: Mark expected switch fall-throughs
Hi all, Friendly ping: Who can take this? Thanks -- Gustavo On 2/12/19 3:43 PM, Gustavo A. R. Silva wrote: > In preparation to enabling -Wimplicit-fallthrough, mark switch > cases where we are expecting to fall through. > > This patch fixes the following warnings: > > drivers/hwtracing/intel_th/sth.c: In function ‘sth_stm_packet’: > drivers/hwtracing/intel_th/sth.c:86:7: warning: this statement may fall > through [-Wimplicit-fallthrough=] >reg += 4; >^~~~ > drivers/hwtracing/intel_th/sth.c:87:2: note: here > case STP_PACKET_XSYNC: > ^~~~ > drivers/hwtracing/intel_th/sth.c:88:7: warning: this statement may fall > through [-Wimplicit-fallthrough=] >reg += 8; >^~~~ > drivers/hwtracing/intel_th/sth.c:89:2: note: here > case STP_PACKET_TRIG: > ^~~~ > > Warning level 3 was used: -Wimplicit-fallthrough=3 > > This patch is part of the ongoing efforts to enable > -Wimplicit-fallthrough. > > Signed-off-by: Gustavo A. R. Silva > --- > drivers/hwtracing/intel_th/sth.c | 4 > 1 file changed, 4 insertions(+) > > diff --git a/drivers/hwtracing/intel_th/sth.c > b/drivers/hwtracing/intel_th/sth.c > index 4b7ae47789d2..3a1f4e650378 100644 > --- a/drivers/hwtracing/intel_th/sth.c > +++ b/drivers/hwtracing/intel_th/sth.c > @@ -84,8 +84,12 @@ static ssize_t notrace sth_stm_packet(struct stm_data > *stm_data, > /* Global packets (GERR, XSYNC, TRIG) are sent with register writes */ > case STP_PACKET_GERR: > reg += 4; > + /* fall through */ > + > case STP_PACKET_XSYNC: > reg += 8; > + /* fall through */ > + > case STP_PACKET_TRIG: > if (flags & STP_PACKET_TIMESTAMPED) > reg += 4; >
[PATCH] net: dsa: add missing phy address offset
When phys do not start at address 0 like on the mv88e6341 the wrong phy address is used and therefore the slave ports can not be initialized. This patch adds the proper offset to the phy address. Signed-off-by: Marcel Reichmuth --- drivers/net/dsa/mv88e6xxx/chip.c | 3 +++ include/net/dsa.h| 1 + net/dsa/slave.c | 3 ++- 3 files changed, 6 insertions(+), 1 deletion(-) diff --git a/drivers/net/dsa/mv88e6xxx/chip.c b/drivers/net/dsa/mv88e6xxx/chip.c index 12fd7ce3f1ff..0ca649f784d2 100644 --- a/drivers/net/dsa/mv88e6xxx/chip.c +++ b/drivers/net/dsa/mv88e6xxx/chip.c @@ -2198,12 +2198,15 @@ static int mv88e6xxx_setup_upstream_port(struct mv88e6xxx_chip *chip, int port) static int mv88e6xxx_setup_port(struct mv88e6xxx_chip *chip, int port) { struct dsa_switch *ds = chip->ds; + struct dsa_port *dp = >ports[port]; int err; u16 reg; chip->ports[port].chip = chip; chip->ports[port].port = port; + dp->phy_base_addr = chip->info->phy_base_addr; + /* MAC Forcing register: don't force link, speed, duplex or flow control * state to any particular values on physical ports, but force the CPU * port and all DSA ports to their maximum bandwidth and full duplex. diff --git a/include/net/dsa.h b/include/net/dsa.h index b3eefe8e18fd..f9c9dc1f6d21 100644 --- a/include/net/dsa.h +++ b/include/net/dsa.h @@ -196,6 +196,7 @@ struct dsa_port { struct dsa_switch *ds; unsigned intindex; + unsigned intphy_base_addr; const char *name; const struct dsa_port *cpu_dp; struct device_node *dn; diff --git a/net/dsa/slave.c b/net/dsa/slave.c index a1c9fe155057..4f67dff34a3b 100644 --- a/net/dsa/slave.c +++ b/net/dsa/slave.c @@ -1221,7 +1221,8 @@ static int dsa_slave_phy_setup(struct net_device *slave_dev) /* We could not connect to a designated PHY or SFP, so use the * switch internal MDIO bus instead */ - ret = dsa_slave_phy_connect(slave_dev, dp->index); + ret = dsa_slave_phy_connect(slave_dev, dp->phy_base_addr + + dp->index); if (ret) { netdev_err(slave_dev, "failed to connect to port %d: %d\n", -- 2.11.0
Re: [PATCH v15 15/15] tracing: Add hist trigger action 'expected fail' test case
Hi Steve, On Wed, 2019-02-20 at 12:56 -0500, Steven Rostedt wrote: > On Wed, 20 Feb 2019 11:38:22 -0600 > Tom Zanussi wrote: > > > Hi Steve, > > > > On Wed, 2019-02-20 at 12:17 -0500, Steven Rostedt wrote: > > > On Wed, 13 Feb 2019 17:42:55 -0600 > > > Tom Zanussi wrote: > > > > > > > From: Tom Zanussi > > > > > > > > Add a test case verifying that basic action combinations fail > > > > as > > > > expected. > > > > > > > > > > Hi Tom, > > > > > > This test appears to fail: > > > > > > # echo > > > 'hist:keys=comm:onmatch(sched.sched_wakeup).save(comm,prio)' > > > > > /sys/kernel/debug/tracing/events/sched/sched_waking/trigger > > > > > > -bash: echo: write error: Invalid argument > > > > > > # cat /sys/kernel/debug/tracing/events/sched/sched_waking/hist > > > > > > ERROR: action parsing: Handler doesn't support action: save > > > Last command: > > > keys=comm:onmatch(sched.sched_wakeup).save(comm,prio) > > > > > > > > > Is the "save" feature implemented here? It's in the README too. > > > Should > > > it be removed? > > > > > > > The "save" feature is implemented, but it's not currently supported > > with onmatch(), which is why it fails, and is used in the xfail > > test, > > since it's expected to. So, in this case, the command fails, which > > means the xfail test actually passed. ;-) > > > > There are other tests in the inter-event testcases that use save() > > but > > with onmax() and onchange(), and they pass. > > So the test needs to pass on failure? > > Because, it shouldn't be flagged as a failure in the test suite. > As far as I understand it (there's no other case of an xfail test in the testsuite, so nothing similar to compare it to), the test output is correct - here we get the expected fail, XFAIL, and not a FAIL as any test, xfail or normal, that failed would produce: tools/testing/selftests/ftrace# ./ftracetest test.d/trigger/ === Ftrace unit tests === [1] event trigger - test inter-event histogram trigger expected fail actions [XFAIL] [2] event trigger - test extended error support [PASS] And here the summary shows none failed, while we did have one expected xfail, but that's what was expected, and not a failure: # of passed: 31 # of failed: 0 # of unresolved: 0 # of untested: 0 # of unsupported: 0 # of xfailed: 1 # of undefined(test bug): 0 If that's not correct, I'll fix it but at this point I'm not sure what the output should be if not that. Thanks, Tom > -- Steve > > > > > Hope that explains things in this case, > > > > Tom > > > > > -- Steve > > > > > > > Signed-off-by: Tom Zanussi > > > > --- > > > > .../inter-event/trigger-action-hist-xfail.tc | 30 > > > > ++ > > > > 1 file changed, 30 insertions(+) > > > > create mode 100644 > > > > tools/testing/selftests/ftrace/test.d/trigger/inter- > > > > event/trigger- > > > > action-hist-xfail.tc > > > > > > > > diff --git > > > > a/tools/testing/selftests/ftrace/test.d/trigger/inter- > > > > event/trigger-action-hist-xfail.tc > > > > b/tools/testing/selftests/ftrace/test.d/trigger/inter- > > > > event/trigger-action-hist-xfail.tc > > > > new file mode 100644 > > > > index ..1221240f8cf6 > > > > --- /dev/null > > > > +++ b/tools/testing/selftests/ftrace/test.d/trigger/inter- > > > > event/trigger-action-hist-xfail.tc > > > > @@ -0,0 +1,30 @@ > > > > +#!/bin/sh > > > > +# SPDX-License-Identifier: GPL-2.0 > > > > +# description: event trigger - test inter-event histogram > > > > trigger > > > > expected fail actions > > > > + > > > > +fail() { #msg > > > > +echo $1 > > > > +exit_fail > > > > +} > > > > + > > > > +if [ ! -f set_event ]; then > > > > +echo "event tracing is not supported" > > > > +exit_unsupported > > > > +fi > > > > + > > > > +if [ ! -f snapshot ]; then > > > > +echo "snapshot is not supported" > > > > +exit_unsupported > > > > +fi > > > > + > > > > +grep -q "snapshot()" README || exit_unsupported # version > > > > issue > > > > + > > > > +echo "Test expected snapshot action failure" > > > > + > > > > +echo 'hist:keys=comm:onmatch(sched.sched_wakeup).snapshot()' > > > > >> > > > > /sys/kernel/debug/tracing/events/sched/sched_waking/trigger && > > > > exit_fail > > > > + > > > > +echo "Test expected save action failure" > > > > + > > > > +echo > > > > 'hist:keys=comm:onmatch(sched.sched_wakeup).save(comm,prio)' > > > > > > /sys/kernel/debug/tracing/events/sched/sched_waking/trigger > > > > > > && > > > > > > > > exit_fail > > > > + > > > > +exit_xfail > > > > > > > >
Re: [PATCH] crypto: ccree - fix missing break in switch statement
Hi all, Friendly ping: Who can take this? Thanks -- Gustavo On 2/11/19 12:31 PM, Gustavo A. R. Silva wrote: > Add missing break statement in order to prevent the code from falling > through to case S_DIN_to_DES. > > This bug was found thanks to the ongoing efforts to enable > -Wimplicit-fallthrough. > > Fixes: 63ee04c8b491 ("crypto: ccree - add skcipher support") > Cc: sta...@vger.kernel.org > Signed-off-by: Gustavo A. R. Silva > --- > drivers/crypto/ccree/cc_cipher.c | 1 + > 1 file changed, 1 insertion(+) > > diff --git a/drivers/crypto/ccree/cc_cipher.c > b/drivers/crypto/ccree/cc_cipher.c > index 5e3361a363b5..d9c17078517b 100644 > --- a/drivers/crypto/ccree/cc_cipher.c > +++ b/drivers/crypto/ccree/cc_cipher.c > @@ -80,6 +80,7 @@ static int validate_keys_sizes(struct cc_cipher_ctx *ctx_p, > u32 size) > default: > break; > } > + break; > case S_DIN_to_DES: > if (size == DES3_EDE_KEY_SIZE || size == DES_KEY_SIZE) > return 0; >
Re: [RFC 0/5] RCU fixes for rcu_assign_pointer usage
On Wed, Feb 20, 2019 at 08:42:43AM -0800, Paul E. McKenney wrote: > On Tue, Feb 19, 2019 at 08:11:36PM -0800, Joel Fernandes wrote: > > On Tue, Feb 19, 2019 at 8:08 PM Joel Fernandes (Google) > > wrote: > > > > > > These patches fix various RCU API usage issues found due to sparse errors > > > as a > > > result of the recent check to add rcu_check_sparse() to > > > rcu_assign_pointer(). > > > > > > This is very early RFC stage, and is only build tested. I am also only > > > sending > > > to the RCU group for initial review before sending to LKML. Thanks for > > > any feedback! > > > > > > There are still more usages that cause errors such as rbtree which I am > > > looking into. > > > > Looks like it got sent to LKML anyway, ;-) That's Ok since it is > > prefixed as RFC. > > As is only right and proper. ;-) > > I don't see an immediate problem with them, but it would be good to get > the relevant developers and maintainers on CC for the next version. I > cannot claim to know that code very well. Definitely will CC them next time, sorry about that. I'll stop being so shy but I have some scars that are still healing ;-) - Joel
[PATCH v2] x86/asm: Pin sensitive CR4 bits
Several recent exploits have used direct calls to the native_write_cr4() function to disable SMEP and SMAP before then continuing their exploits using userspace memory access. This pins bits of cr4 so that they cannot be changed through a common function. This is not intended to be general ROP protection (which would require CFI to defend against properly), but rather a way to avoid trivial direct function calling (or CFI bypassing via a matching function prototype) as seen in: https://googleprojectzero.blogspot.com/2017/05/exploiting-linux-kernel-via-packet.html (https://github.com/xairy/kernel-exploits/tree/master/CVE-2017-7308) The goals of this change: - pin specific bits (SMEP, SMAP, and UMIP) when writing cr4. - avoid setting the bits too early (they must become pinned only after first being used). - pinning mask needs to be read-only during normal runtime. - pinning needs to be rechecked after set to avoid jumps into the middle of the function. Using __ro_after_init on the mask is done so it can't be first disabled with a malicious write. And since it becomes read-only, we must avoid writing to it later (hence the check for bits already having been set instead of unconditionally writing to the mask). The use of volatile is done to force the compiler to perform a full reload of the mask after setting cr4 (to protect against just jumping into the function past where the masking happens; we must check that the mask was applied after we do the set). Due to how this function can be built by the compiler (especially due to the removal of frame pointers), jumping into the middle of the function frequently doesn't require stack manipulation to construct a stack frame (there may only a retq without pops, which is sufficient for use with exploits like timer overwrites mentioned above). For example, without the recheck, the function may appear as: native_write_cr4: mov [pin], %rbx or %rbx, %rdi 1: mov %rdi, %cr4 retq The masking "or" could be trivially bypassed by just calling to label "1" instead of "native_write_cr4". (CFI will force calls to only be able to call into native_write_cr4, but CFI and CET are uncommon currently.) Signed-off-by: Kees Cook --- v2: fix think-o in cr4_pin recheck (Jann Horn) --- arch/x86/include/asm/special_insns.h | 11 +++ arch/x86/kernel/cpu/common.c | 12 +++- 2 files changed, 22 insertions(+), 1 deletion(-) diff --git a/arch/x86/include/asm/special_insns.h b/arch/x86/include/asm/special_insns.h index 43c029cdc3fe..4c26004ed5d4 100644 --- a/arch/x86/include/asm/special_insns.h +++ b/arch/x86/include/asm/special_insns.h @@ -72,9 +72,20 @@ static inline unsigned long native_read_cr4(void) return val; } +extern volatile unsigned long cr4_pin; + static inline void native_write_cr4(unsigned long val) { +again: + val |= cr4_pin; asm volatile("mov %0,%%cr4": : "r" (val), "m" (__force_order)); + /* +* If the MOV above was used directly as a ROP gadget we can +* notice the lack of pinned bits in "val" and start the function +* from the beginning to gain the cr4_pin bits for sure. +*/ + if (WARN_ONCE((val & cr4_pin) != cr4_pin, "cr4 bypass attempt?!\n")) + goto again; } #ifdef CONFIG_X86_64 diff --git a/arch/x86/kernel/cpu/common.c b/arch/x86/kernel/cpu/common.c index cb28e98a0659..7e0ea4470f8e 100644 --- a/arch/x86/kernel/cpu/common.c +++ b/arch/x86/kernel/cpu/common.c @@ -312,10 +312,16 @@ static __init int setup_disable_smep(char *arg) } __setup("nosmep", setup_disable_smep); +volatile unsigned long cr4_pin __ro_after_init; +EXPORT_SYMBOL_GPL(cr4_pin); + static __always_inline void setup_smep(struct cpuinfo_x86 *c) { - if (cpu_has(c, X86_FEATURE_SMEP)) + if (cpu_has(c, X86_FEATURE_SMEP)) { + if (!(cr4_pin & X86_CR4_SMEP)) + cr4_pin |= X86_CR4_SMEP; cr4_set_bits(X86_CR4_SMEP); + } } static __init int setup_disable_smap(char *arg) @@ -334,6 +340,8 @@ static __always_inline void setup_smap(struct cpuinfo_x86 *c) if (cpu_has(c, X86_FEATURE_SMAP)) { #ifdef CONFIG_X86_SMAP + if (!(cr4_pin & X86_CR4_SMAP)) + cr4_pin |= X86_CR4_SMAP; cr4_set_bits(X86_CR4_SMAP); #else cr4_clear_bits(X86_CR4_SMAP); @@ -351,6 +359,8 @@ static __always_inline void setup_umip(struct cpuinfo_x86 *c) if (!cpu_has(c, X86_FEATURE_UMIP)) goto out; + if (!(cr4_pin & X86_CR4_UMIP)) + cr4_pin |= X86_CR4_UMIP; cr4_set_bits(X86_CR4_UMIP); pr_info_once("x86/cpu: User Mode Instruction Prevention (UMIP) activated\n"); -- 2.17.1 -- Kees Cook
Re: [RFC 1/5] net: rtnetlink: Fix incorrect RCU API usage
On Wed, Feb 20, 2019 at 08:40:34AM -0800, Paul E. McKenney wrote: > On Tue, Feb 19, 2019 at 11:08:23PM -0500, Joel Fernandes (Google) wrote: > > From: Joel Fernandes > > > > rtnl_register_internal() and rtnl_unregister_all tries to directly > > dereference an RCU protected pointed outside RCU read side section. > > While this is Ok to do since a lock is held, let us use the correct > > API to avoid programmer bugs in the future. > > > > This also fixes sparse warnings arising from not using RCU API. > > > > net/core/rtnetlink.c:332:13: warning: incorrect type in assignment > > (different address spaces) net/core/rtnetlink.c:332:13:expected > > struct rtnl_link **tab net/core/rtnetlink.c:332:13:got struct > > rtnl_link *[noderef] * > > > > Signed-off-by: Joel Fernandes > > First, thank you for doing this! No problem, it is my pleasure. It is just good to see these warnings/errors show up (which I didn't anticipate when I first wrote the check) so we can harden the kernel more fwiw. > I was going to complain that these were update-side accesses, but it > looks like rtnl_dereference() already handles both readers and updaters. > > So looks good to me, but the maintainers of course have the final word. Thanks! Also my confidence level is a bit less for patches 4/5 and 5/5, could you share your thoughts on those? The scheduler code seems to use rcu_assign_pointer() in those where it seems a WRITE_ONCE() would just suffice. In fact, in some cases I replaced with smp_store_release() just to be safe. Speaking of which, do you feel those are legit uses of rcu_assign_pointer() or would you expect rcu_assign_pointer() to be used only for RCU protected pointers? I am hoping it is the latter since that is what the sparse check expects (and RCU protected pointer being assigned to). - Joel > > > --- > > net/core/rtnetlink.c | 4 ++-- > > 1 file changed, 2 insertions(+), 2 deletions(-) > > > > diff --git a/net/core/rtnetlink.c b/net/core/rtnetlink.c > > index 5ea1bed08ede..98be4b4818a9 100644 > > --- a/net/core/rtnetlink.c > > +++ b/net/core/rtnetlink.c > > @@ -188,7 +188,7 @@ static int rtnl_register_internal(struct module *owner, > > msgindex = rtm_msgindex(msgtype); > > > > rtnl_lock(); > > - tab = rtnl_msg_handlers[protocol]; > > + tab = rtnl_dereference(rtnl_msg_handlers[protocol]); > > if (tab == NULL) { > > tab = kcalloc(RTM_NR_MSGTYPES, sizeof(void *), GFP_KERNEL); > > if (!tab) > > @@ -329,7 +329,7 @@ void rtnl_unregister_all(int protocol) > > BUG_ON(protocol < 0 || protocol > RTNL_FAMILY_MAX); > > > > rtnl_lock(); > > - tab = rtnl_msg_handlers[protocol]; > > + tab = rtnl_dereference(rtnl_msg_handlers[protocol]); > > if (!tab) { > > rtnl_unlock(); > > return; > > -- > > 2.21.0.rc0.258.g878e2cd30e-goog > > >
Re: [PATCH] kasan: turn off asan-stack for clang-8 and earlier
+ Evgenii On Wed, Feb 20, 2019 at 9:36 AM Arnd Bergmann wrote: > > On Wed, Feb 20, 2019 at 6:00 PM Andrey Ryabinin > wrote: > > On 2/20/19 5:51 PM, Arnd Bergmann wrote: > > > On Wed, Feb 20, 2019 at 3:45 PM Andrey Konovalov > > > wrote: > > > I would have to some more research, but I expect several hundred > > > patches before we get to a clean randconfig build with a broken > > > compiler. > > > > Manually maintaining asan-stack parameter for the sake of one broken > > compiler isn't a great idea either. > > > > Couple alternative suggestions: > > > > 1) If we can't fix the problem or the cost of fixing is too high, maybe > > just hide it? Disable -Wframe-larger-then on pre clang-9 compilers. > > > > 2) Fallback cflags. The idea is to try to compile every the file with > > "-mllvm -asan-stack=1 -Wframe-larger-than=2048 -Werror" at first, > > and fallback to "-mllvm -asan-stack=0" if failed. So it would be something > > similar to $(call cc-option, -mllvm -asan-stack=1 -Wframe-larger-than=2048 > > -Werror, -mllvm -asan-stack=0) > > except that "cc-option" tries options only once on some code example while > > we need to try options on every file that we actually compile. > > Honestly, I'm not sure that it's worthy to hack Kbuild engine for that > > particular use-case. > > My original plan was to put this under CONFIG_KASAN_EXTRA to allow you > to still enable it in older compilers, but you just removed that option ;-) > > Maybe bringing it back would be a compromise? That way it's hidden from > all the build testing bots (because of the !CONFIG_COMPILE_TEST dependency), > but anyone who really wants it can still have the option, and set > CONFIG_FRAME_WARN > to whichever value they like. > > Arnd I like Evgenii's idea: https://bugs.llvm.org/show_bug.cgi?id=38809#c10 Even though something like that wouldn't make the clang-8 train, I think it's ok. While I myself share Arnd's goal of driving compiler warnings to zero, in general I'd prefer not to disable warning-producing-features or disable warnings outright for cases where we have some ideas of changes we can make to the compiler. There's probably a list now of false warnings produced by old versions of Clang from bugs in Clang that we fixed. I'm not interested in additionally trying to work around those somehow in kernel sources. Qian previously pointed out that most drivers don't produce this warning under KASAN+Clang. While 114 is a lot, what are the chances that someone NEEDS a KASAN+Clang build to compile warning free and happen to include one of these problematic drivers? And if there is a chance they do observe the warning, are we doing a disservice by disabling the feature (-asan-stack=1) outright for the whole kernel, or disabling the warning (`-Wstack-frame-larger-than=`) which can flag issues unrelated to KASAN? To Evgenii's idea, I vote that the compiler is incorrect here, and we shouldn't start turning things off. Evgenii, do you have some sense of how to tune the inliner as you described? -- Thanks, ~Nick Desaulniers
Re: [PATCH] iio: mma8452: mark expected switch fall-through
On 2/20/19 11:21 AM, Gustavo A. R. Silva wrote: > > > On 2/20/19 6:17 AM, Jonathan Cameron wrote: >> On Mon, 11 Feb 2019 16:23:18 -0600 >> "Gustavo A. R. Silva" wrote: >> >>> In preparation to enabling -Wimplicit-fallthrough, mark switch >>> cases where we are expecting to fall through. >>> >>> This patch fixes the following warning: >>> >>> drivers/iio/accel/mma8452.c: In function ‘mma8452_probe’: >>> drivers/iio/accel/mma8452.c:1581:6: warning: this statement may fall >>> through [-Wimplicit-fallthrough=] >>>if (ret == data->chip_info->chip_id) >>> ^ >>> drivers/iio/accel/mma8452.c:1584:2: note: here >>> default: >>> ^~~ >>> >>> Warning level 3 was used: -Wimplicit-fallthrough=3 >>> >>> Notice that, in this particular case, the code comment is modified >>> in accordance with what GCC is expecting to find. >>> >>> This patch is part of the ongoing efforts to enable >>> -Wimplicit-fallthrough. >>> >>> Signed-off-by: Gustavo A. R. Silva >> I know Peter probably won't like this, as it doesn't >> read a as well, with the else dropped, but I'm going to take >> it as we have had a lot of bugs caught by this code and this >> is generating a false positive. >> >> Applied to the togreg branch of iio.git and pushed out as testing >> for the autobuilders to play with it. >> > > Thanks, Jonathan. > BTW, Jonathan, I wonder if you can apply this one too: https://lore.kernel.org/patchwork/patch/996804/ Thanks -- Gustavo
Re: [PATCH v2] driver: platform: Support parsing GpioInt 0 in platform_get_irq()
Hi, On Mon, Feb 11, 2019 at 11:01:12AM -0800, egran...@chromium.org wrote: > From: Enrico Granata > > ACPI 5 added support for GpioInt resources as a way to provide > information about interrupts mediated via a GPIO controller. > > Several device buses (e.g. SPI, I2C) have support for retrieving > an IRQ specified via this type of resource, and providing it > directly to the driver as an IRQ number. > > This is not currently done for the platform drivers, as platform_get_irq() > does not try to parse GpioInt() resources. This requires drivers to > either have to support only one possible IRQ resource, or to have code > in place to try both as a failsafe. > > While there is a possibility of ambiguity for devices that exposes > multiple IRQs, it is easy and feasible to support the common case > of devices that only expose one IRQ which would be of either type > depending on the underlying system's architecture. > > This commit adds support for parsing a GpioInt resource in order > to fulfill a request for the index 0 IRQ for a platform device. > > Signed-off-by: Enrico Granata > --- > Changes in v2: > - only support IRQ index 0 > > drivers/base/platform.c | 15 ++- > 1 file changed, 14 insertions(+), 1 deletion(-) > > diff --git a/drivers/base/platform.c b/drivers/base/platform.c > index 1c958eb33ef4d..0d3611cd1b3bc 100644 > --- a/drivers/base/platform.c > +++ b/drivers/base/platform.c > @@ -127,7 +127,20 @@ int platform_get_irq(struct platform_device *dev, > unsigned int num) > irqd_set_trigger_type(irqd, r->flags & IORESOURCE_BITS); > } > > - return r ? r->start : -ENXIO; > + if (r) > + return r->start; > + > + /* > + * For the index 0 interrupt, allow falling back to GpioInt > + * resources. While a device could have both Interrupt and GpioInt > + * resources, making this fallback ambiguous, in many common cases > + * the device will only expose one IRQ, and this fallback > + * allows a common code path across either kind of resource. > + */ > + if (num == 0 && has_acpi_companion(>dev)) > + return acpi_dev_gpio_irq_get(ACPI_COMPANION(>dev), num); For ACPI devices, this changes the return code for a missing interrupt 0 from ENXIO to ENOENT, because acpi_dev_gpio_irq_get() uses ENOENT instead of ENXIO. While ENXIO isn't exactly documented as the *specific* error code for a missing interrupt in platform_get_irq(), there are definitely drivers out there that are looking specifically for ENXIO (grepping the tree finds several Rockchip platform drivers and a few ethernet drivers at a minimum). And it also incidentally broke some usage of the very driver you were trying to support (drivers/platform/chrome/cros_ec_lpc.c). I suspect a good strategy here would be to check acpi_dev_gpio_irq_get()'s return codes here with something like: if (ret > 0 || ret == -EPROBE_DEFER) return ret; return -ENXIO; Although, the gpiolib functions embedded in there also can return EIO, so maybe something like this is better? if (ret == -ENOENT || ret == 0) return -ENXIO; return ret; I'm kinda unsure what to do with error codes besides PROBE_DEFER or "missing", since most users don't really have it in their mind that platform_get_irq() can fail with EIO or similar. Brian > + > + return -ENXIO; > #endif > } > EXPORT_SYMBOL_GPL(platform_get_irq); > -- > 2.20.1.791.gb4d0f1c61a-goog >
Re: xen/evtchn and forced threaded irq
Hi, On 20/02/2019 17:07, Boris Ostrovsky wrote: On 2/20/19 9:15 AM, Julien Grall wrote: Hi Boris, Thank you for your answer. On 20/02/2019 00:02, Boris Ostrovsky wrote: On Tue, Feb 19, 2019 at 05:31:10PM +, Julien Grall wrote: Hi all, I have been looking at using Linux RT in Dom0. Once the guest is started, the console is ending to have a lot of warning (see trace below). After some investigation, this is because the irq handler will now be threaded. I can reproduce the same error with the vanilla Linux when passing the option 'threadirqs' on the command line (the trace below is from 5.0.0-rc7 that has not RT support). FWIW, the interrupt for port 6 is used to for the guest to communicate with xenstore.  From my understanding, this is happening because the interrupt handler is now run in a thread. So we can have the following happening.    Interrupt context   | Interrupt thread |    receive interrupt port 6 |    clear the evtchn port   |    set IRQF_RUNTHREAD   |    kick interrupt thread   | |   clear IRQF_RUNTHREAD |   call evtchn_interrupt    receive interrupt port 6 |    clear the evtchn port   |    set IRQF_RUNTHREAD  |    kick interrupt thread   | |   disable interrupt port 6 |   evtchn->enabled = false |   [] | |   *** Handling the second interrupt *** |   clear IRQF_RUNTHREAD |   call evtchn_interrupt |   WARN(...) I am not entirely sure how to fix this. I have two solutions in mind: 1) Prevent the interrupt handler to be threaded. We would also need to switch from spin_lock to raw_spin_lock as the former may sleep on RT-Linux. 2) Remove the warning I think access to evtchn->enabled is racy so (with or without the warning) we can't use it reliably. Thinking about it, it would not be the only issue. The ring is sized to contain only one instance of the same event. So if you receive twice the event, you may overflow the ring. Hm... That's another argument in favor of "unthreading" the handler. I first thought it would be possible to unthread it. However, wake_up_interruptible is using a spin_lock. On RT spin_lock can sleep, so this cannot be used in an interrupt context. So I think "unthreading" the handler is not an option here. Another alternative could be to queue the irq if !evtchn->enabled and handle it in evtchn_write() (which is where irq is supposed to be re-enabled). What do you mean by queue? Is it queueing in the ring? No, I was thinking about having a new structure for deferred interrupts. Hmmm, I am not entirely sure what would be the structure here. Could you expand your thinking? Cheers, -- Julien Grall
Re: [PATCH V19 5/7] i2c: tegra: Add DMA support
20.02.2019 21:02, Jon Hunter пишет: > > On 12/02/2019 19:06, Sowjanya Komatineni wrote: >> This patch adds DMA support for Tegra I2C. >> >> Tegra I2C TX and RX FIFO depth is 8 words. PIO mode is used for >> transfer size of the max FIFO depth and DMA mode is used for >> transfer size higher than max FIFO depth to save CPU overhead. >> >> PIO mode needs full intervention of CPU to fill or empty FIFO's >> and also need to service multiple data requests interrupt for the >> same transaction. This adds delay between data bytes of the same >> transfer when CPU is fully loaded and some slave devices has >> internal timeout for no bus activity and stops transaction to >> avoid bus hang. DMA mode is helpful in such cases. >> >> DMA mode is also helpful for Large transfers during downloading or >> uploading FW over I2C to some external devices. >> >> Tegra210 and prior Tegra chips use APBDMA driver which is replaced >> with GPCDMA on Tegra186 and Tegra194. >> This patch uses has_apb_dma flag in hw_feature to differentiate >> DMA driver change between Tegra chipset. >> >> APBDMA driver is registered from module-init level and this patch >> also has a change to register I2C driver at module-init level >> rather than subsys-init to avoid deferring I2C probe till APBDMA >> driver is registered. >> >> Acked-by: Thierry Reding >> Reviewed-by: Dmitry Osipenko >> Tested-by: Dmitry Osipenko >> Signed-off-by: Sowjanya Komatineni > > ... > >> +static int tegra_i2c_init_dma(struct tegra_i2c_dev *i2c_dev) >> +{ >> +struct dma_chan *chan; >> +u32 *dma_buf; >> +dma_addr_t dma_phys; >> +int err; >> + >> +if (!IS_ENABLED(CONFIG_TEGRA20_APB_DMA) || >> +!i2c_dev->hw->has_apb_dma) { >> +err = -ENODEV; >> +goto err_out; >> +} >> + >> +chan = dma_request_slave_channel_reason(i2c_dev->dev, "rx"); >> +if (IS_ERR(chan)) { >> +err = PTR_ERR(chan); >> +goto err_out; >> +} >> + >> +i2c_dev->rx_dma_chan = chan; >> + >> +chan = dma_request_slave_channel_reason(i2c_dev->dev, "tx"); >> +if (IS_ERR(chan)) { >> +err = PTR_ERR(chan); >> +goto err_out; >> +} >> + >> +i2c_dev->tx_dma_chan = chan; >> + >> +dma_buf = dma_alloc_coherent(i2c_dev->dev, i2c_dev->dma_buf_size, >> + _phys, GFP_KERNEL | __GFP_NOWARN); >> +if (!dma_buf) { >> +dev_err(i2c_dev->dev, "failed to allocate the DMA buffer\n"); >> +err = -ENOMEM; >> +goto err_out; >> +} >> + >> +i2c_dev->dma_buf = dma_buf; >> +i2c_dev->dma_phys = dma_phys; >> +return 0; >> + >> +err_out: >> +tegra_i2c_release_dma(i2c_dev); >> +if (err != -EPROBE_DEFER) { >> +dev_err(i2c_dev->dev, "cannot use DMA: %d\n", err); >> +dev_err(i2c_dev->dev, "fallbacking to PIO\n"); >> +return 0; >> +} > I think that the above should be a dev_dbg print or re-worked in someway > because now for Tegra194 which does not have an APB DMA I see ... > > [ 6.093234] ERR KERN tegra-i2c 31c.i2c: cannot use DMA: -19 > [ 6.096847] ERR KERN tegra-i2c 31c.i2c: falling back to PIO > > Given that the APB DMA is not supported for Tegra186/Tegra194, there is > no point in printing these error messages. Now it looks like something > is wrong but really it is not :-( Jon, patches are welcome ;)
Re: [PATCH V19 5/7] i2c: tegra: Add DMA support
On 12/02/2019 19:06, Sowjanya Komatineni wrote: > This patch adds DMA support for Tegra I2C. > > Tegra I2C TX and RX FIFO depth is 8 words. PIO mode is used for > transfer size of the max FIFO depth and DMA mode is used for > transfer size higher than max FIFO depth to save CPU overhead. > > PIO mode needs full intervention of CPU to fill or empty FIFO's > and also need to service multiple data requests interrupt for the > same transaction. This adds delay between data bytes of the same > transfer when CPU is fully loaded and some slave devices has > internal timeout for no bus activity and stops transaction to > avoid bus hang. DMA mode is helpful in such cases. > > DMA mode is also helpful for Large transfers during downloading or > uploading FW over I2C to some external devices. > > Tegra210 and prior Tegra chips use APBDMA driver which is replaced > with GPCDMA on Tegra186 and Tegra194. > This patch uses has_apb_dma flag in hw_feature to differentiate > DMA driver change between Tegra chipset. > > APBDMA driver is registered from module-init level and this patch > also has a change to register I2C driver at module-init level > rather than subsys-init to avoid deferring I2C probe till APBDMA > driver is registered. > > Acked-by: Thierry Reding > Reviewed-by: Dmitry Osipenko > Tested-by: Dmitry Osipenko > Signed-off-by: Sowjanya Komatineni ... > +static int tegra_i2c_init_dma(struct tegra_i2c_dev *i2c_dev) > +{ > + struct dma_chan *chan; > + u32 *dma_buf; > + dma_addr_t dma_phys; > + int err; > + > + if (!IS_ENABLED(CONFIG_TEGRA20_APB_DMA) || > + !i2c_dev->hw->has_apb_dma) { > + err = -ENODEV; > + goto err_out; > + } > + > + chan = dma_request_slave_channel_reason(i2c_dev->dev, "rx"); > + if (IS_ERR(chan)) { > + err = PTR_ERR(chan); > + goto err_out; > + } > + > + i2c_dev->rx_dma_chan = chan; > + > + chan = dma_request_slave_channel_reason(i2c_dev->dev, "tx"); > + if (IS_ERR(chan)) { > + err = PTR_ERR(chan); > + goto err_out; > + } > + > + i2c_dev->tx_dma_chan = chan; > + > + dma_buf = dma_alloc_coherent(i2c_dev->dev, i2c_dev->dma_buf_size, > + _phys, GFP_KERNEL | __GFP_NOWARN); > + if (!dma_buf) { > + dev_err(i2c_dev->dev, "failed to allocate the DMA buffer\n"); > + err = -ENOMEM; > + goto err_out; > + } > + > + i2c_dev->dma_buf = dma_buf; > + i2c_dev->dma_phys = dma_phys; > + return 0; > + > +err_out: > + tegra_i2c_release_dma(i2c_dev); > + if (err != -EPROBE_DEFER) { > + dev_err(i2c_dev->dev, "cannot use DMA: %d\n", err); > + dev_err(i2c_dev->dev, "fallbacking to PIO\n"); > + return 0; > + } I think that the above should be a dev_dbg print or re-worked in someway because now for Tegra194 which does not have an APB DMA I see ... [ 6.093234] ERR KERN tegra-i2c 31c.i2c: cannot use DMA: -19 [ 6.096847] ERR KERN tegra-i2c 31c.i2c: falling back to PIO Given that the APB DMA is not supported for Tegra186/Tegra194, there is no point in printing these error messages. Now it looks like something is wrong but really it is not :-( Cheers Jon -- nvpublic
Re: [RESEND PATCH 0/7] Add FOLL_LONGTERM to GUP fast and use it
On Wed, Feb 20, 2019 at 07:19:30AM -0800, Christoph Hellwig wrote: > On Tue, Feb 19, 2019 at 09:30:33PM -0800, ira.we...@intel.com wrote: > > From: Ira Weiny > > > > Resending these as I had only 1 minor comment which I believe we have > > covered > > in this series. I was anticipating these going through the mm tree as they > > depend on a cleanup patch there and the IB changes are very minor. But they > > could just as well go through the IB tree. > > > > NOTE: This series depends on my clean up patch to remove the write parameter > > from gup_fast_permitted()[1] > > > > HFI1, qib, and mthca, use get_user_pages_fast() due to it performance > > advantages. These pages can be held for a significant time. But > > get_user_pages_fast() does not protect against mapping of FS DAX pages. > > This I don't get - if you do lock down long term mappings performance > of the actual get_user_pages call shouldn't matter to start with. > > What do I miss? A couple of points. First "longterm" is a relative thing and at this point is probably a misnomer. This is really flagging a pin which is going to be given to hardware and can't move. I've thought of a couple of alternative names but I think we have to settle on if we are going to use FL_LAYOUT or something else to solve the "longterm" problem. Then I think we can change the flag to a better name. Second, It depends on how often you are registering memory. I have spoken with some RDMA users who consider MR in the performance path... For the overall application performance. I don't have the numbers as the tests for HFI1 were done a long time ago. But there was a significant advantage. Some of which is probably due to the fact that you don't have to hold mmap_sem. Finally, architecturally I think it would be good for everyone to use *_fast. There are patches submitted to the RDMA list which would allow the use of *_fast (they reworking the use of mmap_sem) and as soon as they are accepted I'll submit a patch to convert the RDMA core as well. Also to this point others are looking to use *_fast.[2] As an asside, Jasons pointed out in my previous submission that *_fast and *_unlocked look very much the same. I agree and I think further cleanup will be coming. But I'm focused on getting the final solution for DAX at the moment. Ira
Re: [PATCH] x86/nmi: ratelimit unknown nmi logs
On Wed, Feb 20, 2019 at 12:59 AM Peter Zijlstra wrote: > > On Tue, Feb 19, 2019 at 05:48:36PM -0800, Olof Johansson wrote: > > Getting notified of unknown NMIs is obviously important, but getting > > notified on every single one, especially on larger systems with slow > > (serial) console causes more harm than good when it's a known noisy > > non-relevant event. > > > > So, let's ratelimit to avoid locking up the system. > > What kind of bonghit broken crap system is that? > > That is; this _really_ should not happen, and this is a bandaid, not > fixing the cause. Oh, I agree -- this shouldn't happen, and it's being debugged and fixed. So, I'm not looking at this as a bandaid to the real problem, but there's also no reason to DoS the system with prink when it does occur. If you want to configure the system to panic on unknown NMI there are already hooks for it. I'm obviously happy to carry local patches for this, since it's a temporary problem. But yet again, I don't see a reason to have the kernel run off the rails for this condition. -Olof
Re: [PATCH v15 15/15] tracing: Add hist trigger action 'expected fail' test case
On Wed, 20 Feb 2019 11:38:22 -0600 Tom Zanussi wrote: > Hi Steve, > > On Wed, 2019-02-20 at 12:17 -0500, Steven Rostedt wrote: > > On Wed, 13 Feb 2019 17:42:55 -0600 > > Tom Zanussi wrote: > > > > > From: Tom Zanussi > > > > > > Add a test case verifying that basic action combinations fail as > > > expected. > > > > > > > Hi Tom, > > > > This test appears to fail: > > > > # echo 'hist:keys=comm:onmatch(sched.sched_wakeup).save(comm,prio)' > > >> /sys/kernel/debug/tracing/events/sched/sched_waking/trigger > > -bash: echo: write error: Invalid argument > > > > # cat /sys/kernel/debug/tracing/events/sched/sched_waking/hist > > > > ERROR: action parsing: Handler doesn't support action: save > > Last command: keys=comm:onmatch(sched.sched_wakeup).save(comm,prio) > > > > > > Is the "save" feature implemented here? It's in the README too. > > Should > > it be removed? > > > > The "save" feature is implemented, but it's not currently supported > with onmatch(), which is why it fails, and is used in the xfail test, > since it's expected to. So, in this case, the command fails, which > means the xfail test actually passed. ;-) > > There are other tests in the inter-event testcases that use save() but > with onmax() and onchange(), and they pass. So the test needs to pass on failure? Because, it shouldn't be flagged as a failure in the test suite. -- Steve > > Hope that explains things in this case, > > Tom > > > -- Steve > > > > > Signed-off-by: Tom Zanussi > > > --- > > > .../inter-event/trigger-action-hist-xfail.tc | 30 > > > ++ > > > 1 file changed, 30 insertions(+) > > > create mode 100644 > > > tools/testing/selftests/ftrace/test.d/trigger/inter-event/trigger- > > > action-hist-xfail.tc > > > > > > diff --git a/tools/testing/selftests/ftrace/test.d/trigger/inter- > > > event/trigger-action-hist-xfail.tc > > > b/tools/testing/selftests/ftrace/test.d/trigger/inter- > > > event/trigger-action-hist-xfail.tc > > > new file mode 100644 > > > index ..1221240f8cf6 > > > --- /dev/null > > > +++ b/tools/testing/selftests/ftrace/test.d/trigger/inter- > > > event/trigger-action-hist-xfail.tc > > > @@ -0,0 +1,30 @@ > > > +#!/bin/sh > > > +# SPDX-License-Identifier: GPL-2.0 > > > +# description: event trigger - test inter-event histogram trigger > > > expected fail actions > > > + > > > +fail() { #msg > > > +echo $1 > > > +exit_fail > > > +} > > > + > > > +if [ ! -f set_event ]; then > > > +echo "event tracing is not supported" > > > +exit_unsupported > > > +fi > > > + > > > +if [ ! -f snapshot ]; then > > > +echo "snapshot is not supported" > > > +exit_unsupported > > > +fi > > > + > > > +grep -q "snapshot()" README || exit_unsupported # version issue > > > + > > > +echo "Test expected snapshot action failure" > > > + > > > +echo 'hist:keys=comm:onmatch(sched.sched_wakeup).snapshot()' >> > > > /sys/kernel/debug/tracing/events/sched/sched_waking/trigger && > > > exit_fail > > > + > > > +echo "Test expected save action failure" > > > + > > > +echo 'hist:keys=comm:onmatch(sched.sched_wakeup).save(comm,prio)' > > > >> /sys/kernel/debug/tracing/events/sched/sched_waking/trigger && > > > exit_fail > > > + > > > +exit_xfail > > > >
Re: [PATCH v2] sched/topology: fix kernel crash when a CPU is hotplugged in a memoryless node
On 20/02/2019 18:08, Peter Zijlstra wrote: > On Wed, Feb 20, 2019 at 05:55:20PM +0100, Laurent Vivier wrote: >> index 3f35ba1d8fde..372278605f0d 100644 >> --- a/kernel/sched/topology.c >> +++ b/kernel/sched/topology.c >> @@ -1651,6 +1651,7 @@ void sched_init_numa(void) >> */ >> tl[i++] = (struct sched_domain_topology_level){ >> .mask = sd_numa_mask, >> +.flags = SDTL_OVERLAP, > > This makes no sense what so ever. The numa identify node should not have > overlap with other domains. > > Are you sure this is not because of the utterly broken powerpc nonsense > where they move CPUs between nodes? No, I'm not sure. This why I've Cc: powerpc folks. My conclusion is only based on the before/after changes. I've tested some patches from powerpc ML, but they don't fix this problem: powerpc/numa: Perform full re-add of CPU for PRRN/VPHN topology update powerpc/pseries: Perform full re-add of CPU for topology update post-migration So the only reason I can see to have a corrupted sched_group list is the sched_domain_span() fonction doesn't return a correct cpumask for the domain once a new CPU is added. Thanks, Laurent
Re: [PATCH] iwlwifi: mvm: Use div64_s64 instead of do_div in iwl_mvm_debug_range_resp
On Wed, Feb 20, 2019 at 11:51:34AM +0100, Arnd Bergmann wrote: > On Tue, Feb 19, 2019 at 7:22 PM Nathan Chancellor > wrote: > > > > > diff --git a/drivers/net/wireless/intel/iwlwifi/mvm/ftm-initiator.c > > b/drivers/net/wireless/intel/iwlwifi/mvm/ftm-initiator.c > > index e9822a3ec373..92b22250eb7d 100644 > > --- a/drivers/net/wireless/intel/iwlwifi/mvm/ftm-initiator.c > > +++ b/drivers/net/wireless/intel/iwlwifi/mvm/ftm-initiator.c > > @@ -462,7 +462,7 @@ static void iwl_mvm_debug_range_resp(struct iwl_mvm > > *mvm, u8 index, > > { > > s64 rtt_avg = res->ftm.rtt_avg * 100; > > > > - do_div(rtt_avg, ); > > + div64_s64(rtt_avg, ); > > This is wrong: div64_s64 does not modify its argument like do_div(), but > it returns the result instead. You also don't want to divide by a 64-bit > value when the second argument is a small constant. > > I think the correct way should be > >s64 rtt_avg = div_s64(res->ftm.rtt_avg * 100, ); > > If you know that the value is positive, using unsigned types > and div_u64() would be slightly faster. > > Arnd Thanks for the review and explanation, Arnd. Luca, could you drop this version so I can resend it? Nathan
Re: [PATCH v5 0/3] drm/vc4: Add a load tracker
Paul Kocialkowski writes: > Hi, > > Here is a fourth iteration of the VC4 load tracking series, which was > initially developed by Boris Brezillon and that I have now taken over. > > This new iteration takes in account comments from v3 and comes with a > new approach for avoiding underrun reports when reconfiguring the > pipeline. It is now based on detection instead of delaying the underrun > interrupt unmasking. > > It can be tested with a dedicated IGT GPU Tools series: > VC4 load tracker testing Series is: Reviewed-by: Eric Anholt Thanks for persisting on this! signature.asc Description: PGP signature
Re: [PATCH v2 1/3] libertas_tf: move hardware callbacks to a separate structure
Lubomir Rintel wrote: > We'll need to talk to the firmware to get a hardware address before > device is registered with ieee80211 subsystem at the end of > lbtf_add_card(). Hooking the callbacks after that is too late. > > Signed-off-by: Lubomir Rintel 3 patches applied to wireless-drivers-next.git, thanks. be9d0d3fe139 libertas_tf: move hardware callbacks to a separate structure baa0280f08c7 libertas_tf: don't defer firmware loading until start() 5d04b22b881d libertas_tf: get the MAC address before registering the device -- https://patchwork.kernel.org/patch/10821819/ https://wireless.wiki.kernel.org/en/developers/documentation/submittingpatches
Re: [PATCH][next] rtlwifi: rtl8192ce: fix typo, "PairwiseENcAlgorithm" -> "PairwiseEncAlgorithm"
Colin King wrote: > From: Colin Ian King > > There is an uppercase 'N' that should be a lowercase 'n', fix this. > > Signed-off-by: Colin Ian King Patch applied to wireless-drivers-next.git, thanks. 0421dd4167ec rtlwifi: rtl8192ce: fix typo, "PairwiseENcAlgorithm" -> "PairwiseEncAlgorithm" -- https://patchwork.kernel.org/patch/10821751/ https://wireless.wiki.kernel.org/en/developers/documentation/submittingpatches
Applied "regulator: max77620: Add missing .owner field in regulator_desc" to the regulator tree
The patch regulator: max77620: Add missing .owner field in regulator_desc has been applied to the regulator tree at https://git.kernel.org/pub/scm/linux/kernel/git/broonie/regulator.git All being well this means that it will be integrated into the linux-next tree (usually sometime in the next 24 hours) and sent to Linus during the next merge window (or sooner if it is a bug fix), however if problems are discovered then the patch may be dropped or reverted. You may get further e-mails resulting from automated or manual testing and review of the tree, please engage with people reporting problems and send followup patches addressing any issues that are reported if needed. If any updates are required or you are submitting further changes they should be sent as incremental updates against current git, existing patches will not be replaced. Please add any relevant lists and maintainers to the CCs when replying to this mail. Thanks, Mark >From 96173b8c8b1cbbc39436b1592f37255ee5e723cb Mon Sep 17 00:00:00 2001 From: Axel Lin Date: Wed, 20 Feb 2019 09:53:27 +0800 Subject: [PATCH] regulator: max77620: Add missing .owner field in regulator_desc Add missing .owner field in regulator_desc, which is used for refcounting. Signed-off-by: Axel Lin Signed-off-by: Mark Brown --- drivers/regulator/max77620-regulator.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/drivers/regulator/max77620-regulator.c b/drivers/regulator/max77620-regulator.c index cd93cf53e23c..1607ac673e44 100644 --- a/drivers/regulator/max77620-regulator.c +++ b/drivers/regulator/max77620-regulator.c @@ -690,6 +690,7 @@ static const struct regulator_ops max77620_regulator_ops = { .active_discharge_mask = MAX77620_SD_CFG1_ADE_MASK, \ .active_discharge_reg = MAX77620_REG_##_id##_CFG, \ .type = REGULATOR_VOLTAGE, \ + .owner = THIS_MODULE, \ }, \ } @@ -721,6 +722,7 @@ static const struct regulator_ops max77620_regulator_ops = { .active_discharge_mask = MAX77620_LDO_CFG2_ADE_MASK, \ .active_discharge_reg = MAX77620_REG_##_id##_CFG2, \ .type = REGULATOR_VOLTAGE, \ + .owner = THIS_MODULE, \ }, \ } -- 2.20.1
Applied "regulator: twl6030: Use regulator_list_voltage_linear_range for twl6030ldo_ops" to the regulator tree
The patch regulator: twl6030: Use regulator_list_voltage_linear_range for twl6030ldo_ops has been applied to the regulator tree at https://git.kernel.org/pub/scm/linux/kernel/git/broonie/regulator.git All being well this means that it will be integrated into the linux-next tree (usually sometime in the next 24 hours) and sent to Linus during the next merge window (or sooner if it is a bug fix), however if problems are discovered then the patch may be dropped or reverted. You may get further e-mails resulting from automated or manual testing and review of the tree, please engage with people reporting problems and send followup patches addressing any issues that are reported if needed. If any updates are required or you are submitting further changes they should be sent as incremental updates against current git, existing patches will not be replaced. Please add any relevant lists and maintainers to the CCs when replying to this mail. Thanks, Mark >From 4a43870ae166a1a0bbc44a7b5ee63653303827bb Mon Sep 17 00:00:00 2001 From: Axel Lin Date: Sun, 17 Feb 2019 21:48:08 +0800 Subject: [PATCH] regulator: twl6030: Use regulator_list_voltage_linear_range for twl6030ldo_ops Use linear range to replace the twl6030ldo_list_voltage implementation. With this change, the min_mV is not used and can be removed from struct twlreg_info. Signed-off-by: Axel Lin Signed-off-by: Mark Brown --- drivers/regulator/twl6030-regulator.c | 73 ++- 1 file changed, 27 insertions(+), 46 deletions(-) diff --git a/drivers/regulator/twl6030-regulator.c b/drivers/regulator/twl6030-regulator.c index 78b964539775..dcaa6512d760 100644 --- a/drivers/regulator/twl6030-regulator.c +++ b/drivers/regulator/twl6030-regulator.c @@ -31,9 +31,6 @@ struct twlreg_info { /* twl resource ID, for resource control state machine */ u8 id; - /* chip constraints on regulator behavior */ - u16 min_mV; - u8 flags; /* used by regulator core */ @@ -252,27 +249,6 @@ static struct regulator_ops twl6030coresmps_ops = { .get_voltage= twl6030coresmps_get_voltage, }; -static int twl6030ldo_list_voltage(struct regulator_dev *rdev, unsigned sel) -{ - struct twlreg_info *info = rdev_get_drvdata(rdev); - - switch (sel) { - case 0: - return 0; - case 1 ... 24: - /* Linear mapping from 0001 to 00011000: -* Absolute voltage value = 1.0 V + 0.1 V × (sel – 0001) -*/ - return (info->min_mV + 100 * (sel - 1)) * 1000; - case 25 ... 30: - return -EINVAL; - case 31: - return 275; - default: - return -EINVAL; - } -} - static int twl6030ldo_set_voltage_sel(struct regulator_dev *rdev, unsigned selector) { @@ -291,7 +267,7 @@ static int twl6030ldo_get_voltage_sel(struct regulator_dev *rdev) } static struct regulator_ops twl6030ldo_ops = { - .list_voltage = twl6030ldo_list_voltage, + .list_voltage = regulator_list_voltage_linear_range, .set_voltage_sel = twl6030ldo_set_voltage_sel, .get_voltage_sel = twl6030ldo_get_voltage_sel, @@ -513,6 +489,11 @@ static struct regulator_ops twlsmps_ops = { }; /*--*/ +static const struct regulator_linear_range twl6030ldo_linear_range[] = { + REGULATOR_LINEAR_RANGE(0, 0, 0, 0), + REGULATOR_LINEAR_RANGE(100, 1, 24, 10), + REGULATOR_LINEAR_RANGE(275, 31, 31, 0), +}; #define TWL6030_ADJUSTABLE_SMPS(label) \ static const struct twlreg_info TWL6030_INFO_##label = { \ @@ -525,28 +506,30 @@ static const struct twlreg_info TWL6030_INFO_##label = { \ }, \ } -#define TWL6030_ADJUSTABLE_LDO(label, offset, min_mVolts) \ +#define TWL6030_ADJUSTABLE_LDO(label, offset) \ static const struct twlreg_info TWL6030_INFO_##label = { \ .base = offset, \ - .min_mV = min_mVolts, \ .desc = { \ .name = #label, \ .id = TWL6030_REG_##label, \ .n_voltages = 32, \ + .linear_ranges = twl6030ldo_linear_range, \ + .n_linear_ranges = ARRAY_SIZE(twl6030ldo_linear_range), \ .ops = _ops, \ .type = REGULATOR_VOLTAGE, \ .owner = THIS_MODULE, \ }, \ } -#define TWL6032_ADJUSTABLE_LDO(label, offset, min_mVolts) \ +#define TWL6032_ADJUSTABLE_LDO(label, offset) \ static const struct twlreg_info TWL6032_INFO_##label = { \ .base = offset, \ - .min_mV = min_mVolts, \ .desc = { \ .name = #label, \ .id = TWL6032_REG_##label, \ .n_voltages = 32, \ + .linear_ranges = twl6030ldo_linear_range, \ + .n_linear_ranges =
Applied "regulator: tps65218.c: fix LS3 issues" to the regulator tree
The patch regulator: tps65218.c: fix LS3 issues has been applied to the regulator tree at https://git.kernel.org/pub/scm/linux/kernel/git/broonie/regulator.git All being well this means that it will be integrated into the linux-next tree (usually sometime in the next 24 hours) and sent to Linus during the next merge window (or sooner if it is a bug fix), however if problems are discovered then the patch may be dropped or reverted. You may get further e-mails resulting from automated or manual testing and review of the tree, please engage with people reporting problems and send followup patches addressing any issues that are reported if needed. If any updates are required or you are submitting further changes they should be sent as incremental updates against current git, existing patches will not be replaced. Please add any relevant lists and maintainers to the CCs when replying to this mail. Thanks, Mark >From 71a64ba2031f4b67769618b9e35389906026130d Mon Sep 17 00:00:00 2001 From: Christian Hohnstaedt Date: Wed, 20 Feb 2019 09:15:50 +0100 Subject: [PATCH] regulator: tps65218.c: fix LS3 issues MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit - Fix list of valid LS3 currents from mA to µA - Fix selection of min/max microAmps of LS3. Selecting one of the configured values as max value now really selects it instead of the next lower one Signed-off-by: Christian Hohnstaedt Signed-off-by: Mark Brown --- drivers/regulator/tps65218-regulator.c | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/drivers/regulator/tps65218-regulator.c b/drivers/regulator/tps65218-regulator.c index 6209beee1018..5dd559eabc81 100644 --- a/drivers/regulator/tps65218-regulator.c +++ b/drivers/regulator/tps65218-regulator.c @@ -188,7 +188,8 @@ static struct regulator_ops tps65218_ldo1_dcdc34_ops = { .set_suspend_disable= tps65218_pmic_set_suspend_disable, }; -static const int ls3_currents[] = { 100, 200, 500, 1000 }; +static const int ls3_currents[] = { 10, 20, 50, 100 }; + static int tps65218_pmic_set_input_current_lim(struct regulator_dev *dev, int lim_uA) @@ -214,7 +215,7 @@ static int tps65218_pmic_set_current_limit(struct regulator_dev *dev, unsigned int num_currents = ARRAY_SIZE(ls3_currents); struct tps65218 *tps = rdev_get_drvdata(dev); - while (index < num_currents && ls3_currents[index] < max_uA) + while (index < num_currents && ls3_currents[index] <= max_uA) index++; index--; -- 2.20.1
Applied "regulator: twl6030: Constify regulator_ops" to the regulator tree
The patch regulator: twl6030: Constify regulator_ops has been applied to the regulator tree at https://git.kernel.org/pub/scm/linux/kernel/git/broonie/regulator.git All being well this means that it will be integrated into the linux-next tree (usually sometime in the next 24 hours) and sent to Linus during the next merge window (or sooner if it is a bug fix), however if problems are discovered then the patch may be dropped or reverted. You may get further e-mails resulting from automated or manual testing and review of the tree, please engage with people reporting problems and send followup patches addressing any issues that are reported if needed. If any updates are required or you are submitting further changes they should be sent as incremental updates against current git, existing patches will not be replaced. Please add any relevant lists and maintainers to the CCs when replying to this mail. Thanks, Mark >From 606640bbbe449d05d12b51b0500e6b535ec54987 Mon Sep 17 00:00:00 2001 From: Axel Lin Date: Sun, 17 Feb 2019 21:48:09 +0800 Subject: [PATCH] regulator: twl6030: Constify regulator_ops Signed-off-by: Axel Lin Signed-off-by: Mark Brown --- drivers/regulator/twl6030-regulator.c | 8 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/drivers/regulator/twl6030-regulator.c b/drivers/regulator/twl6030-regulator.c index dcaa6512d760..15f19df6bc5d 100644 --- a/drivers/regulator/twl6030-regulator.c +++ b/drivers/regulator/twl6030-regulator.c @@ -244,7 +244,7 @@ static int twl6030coresmps_get_voltage(struct regulator_dev *rdev) return -ENODEV; } -static struct regulator_ops twl6030coresmps_ops = { +static const struct regulator_ops twl6030coresmps_ops = { .set_voltage= twl6030coresmps_set_voltage, .get_voltage= twl6030coresmps_get_voltage, }; @@ -266,7 +266,7 @@ static int twl6030ldo_get_voltage_sel(struct regulator_dev *rdev) return vsel; } -static struct regulator_ops twl6030ldo_ops = { +static const struct regulator_ops twl6030ldo_ops = { .list_voltage = regulator_list_voltage_linear_range, .set_voltage_sel = twl6030ldo_set_voltage_sel, @@ -281,7 +281,7 @@ static struct regulator_ops twl6030ldo_ops = { .get_status = twl6030reg_get_status, }; -static struct regulator_ops twl6030fixed_ops = { +static const struct regulator_ops twl6030fixed_ops = { .list_voltage = regulator_list_voltage_linear, .enable = twl6030reg_enable, @@ -472,7 +472,7 @@ static int twl6030smps_get_voltage_sel(struct regulator_dev *rdev) return twlreg_read(info, TWL_MODULE_PM_RECEIVER, VREG_VOLTAGE_SMPS); } -static struct regulator_ops twlsmps_ops = { +static const struct regulator_ops twlsmps_ops = { .list_voltage = twl6030smps_list_voltage, .map_voltage= twl6030smps_map_voltage, -- 2.20.1
Applied "regulator: max77650: Add missing .owner field in regulator_desc" to the regulator tree
The patch regulator: max77650: Add missing .owner field in regulator_desc has been applied to the regulator tree at https://git.kernel.org/pub/scm/linux/kernel/git/broonie/regulator.git All being well this means that it will be integrated into the linux-next tree (usually sometime in the next 24 hours) and sent to Linus during the next merge window (or sooner if it is a bug fix), however if problems are discovered then the patch may be dropped or reverted. You may get further e-mails resulting from automated or manual testing and review of the tree, please engage with people reporting problems and send followup patches addressing any issues that are reported if needed. If any updates are required or you are submitting further changes they should be sent as incremental updates against current git, existing patches will not be replaced. Please add any relevant lists and maintainers to the CCs when replying to this mail. Thanks, Mark >From 721efb504d28ce0bc704643ea2d952b9e87ed9f7 Mon Sep 17 00:00:00 2001 From: Axel Lin Date: Wed, 20 Feb 2019 09:53:28 +0800 Subject: [PATCH] regulator: max77650: Add missing .owner field in regulator_desc Add missing .owner field in regulator_desc, which is used for refcounting. Signed-off-by: Axel Lin Signed-off-by: Mark Brown --- drivers/regulator/max77650-regulator.c | 6 ++ 1 file changed, 6 insertions(+) diff --git a/drivers/regulator/max77650-regulator.c b/drivers/regulator/max77650-regulator.c index 5afb91400832..411912d5278b 100644 --- a/drivers/regulator/max77650-regulator.c +++ b/drivers/regulator/max77650-regulator.c @@ -319,6 +319,7 @@ static struct max77650_regulator_desc max77650_LDO_desc = { .active_discharge_reg = MAX77650_REG_CNFG_LDO_B, .enable_time= 100, .type = REGULATOR_VOLTAGE, + .owner = THIS_MODULE, }, .regA = MAX77650_REG_CNFG_LDO_A, .regB = MAX77650_REG_CNFG_LDO_B, @@ -343,6 +344,7 @@ static struct max77650_regulator_desc max77650_SBB0_desc = { .active_discharge_reg = MAX77650_REG_CNFG_SBB0_B, .enable_time= 100, .type = REGULATOR_VOLTAGE, + .owner = THIS_MODULE, }, .regA = MAX77650_REG_CNFG_SBB0_A, .regB = MAX77650_REG_CNFG_SBB0_B, @@ -367,6 +369,7 @@ static struct max77650_regulator_desc max77650_SBB1_desc = { .active_discharge_reg = MAX77650_REG_CNFG_SBB1_B, .enable_time= 100, .type = REGULATOR_VOLTAGE, + .owner = THIS_MODULE, }, .regA = MAX77650_REG_CNFG_SBB1_A, .regB = MAX77650_REG_CNFG_SBB1_B, @@ -390,6 +393,7 @@ static struct max77650_regulator_desc max77651_SBB1_desc = { .active_discharge_reg = MAX77650_REG_CNFG_SBB1_B, .enable_time= 100, .type = REGULATOR_VOLTAGE, + .owner = THIS_MODULE, }, .regA = MAX77650_REG_CNFG_SBB1_A, .regB = MAX77650_REG_CNFG_SBB1_B, @@ -414,6 +418,7 @@ static struct max77650_regulator_desc max77650_SBB2_desc = { .active_discharge_reg = MAX77650_REG_CNFG_SBB2_B, .enable_time= 100, .type = REGULATOR_VOLTAGE, + .owner = THIS_MODULE, }, .regA = MAX77650_REG_CNFG_SBB2_A, .regB = MAX77650_REG_CNFG_SBB2_B, @@ -438,6 +443,7 @@ static struct max77650_regulator_desc max77651_SBB2_desc = { .active_discharge_reg = MAX77650_REG_CNFG_SBB2_B, .enable_time= 100, .type = REGULATOR_VOLTAGE, + .owner = THIS_MODULE, }, .regA = MAX77650_REG_CNFG_SBB2_A, .regB = MAX77650_REG_CNFG_SBB2_B, -- 2.20.1
Applied "ASoC: samsung: odroid: Fix of_node refcount unbalance" to the asoc tree
The patch ASoC: samsung: odroid: Fix of_node refcount unbalance has been applied to the asoc tree at https://git.kernel.org/pub/scm/linux/kernel/git/broonie/sound.git All being well this means that it will be integrated into the linux-next tree (usually sometime in the next 24 hours) and sent to Linus during the next merge window (or sooner if it is a bug fix), however if problems are discovered then the patch may be dropped or reverted. You may get further e-mails resulting from automated or manual testing and review of the tree, please engage with people reporting problems and send followup patches addressing any issues that are reported if needed. If any updates are required or you are submitting further changes they should be sent as incremental updates against current git, existing patches will not be replaced. Please add any relevant lists and maintainers to the CCs when replying to this mail. Thanks, Mark >From d832d2b246c516eacb2d0ba53ec17ed59c3cd62b Mon Sep 17 00:00:00 2001 From: Sylwester Nawrocki Date: Wed, 20 Feb 2019 12:06:07 +0100 Subject: [PATCH] ASoC: samsung: odroid: Fix of_node refcount unbalance In odroid_audio_probe() some OF nodes are left without reference count decrease after use. Fix it by ensuring required of_node_calls() are done before exiting probe. Reported-by: Takashi Iwai Signed-off-by: Sylwester Nawrocki Signed-off-by: Mark Brown --- sound/soc/samsung/odroid.c | 19 --- 1 file changed, 12 insertions(+), 7 deletions(-) diff --git a/sound/soc/samsung/odroid.c b/sound/soc/samsung/odroid.c index bd2c5163dc7f..c3b0f6c612cb 100644 --- a/sound/soc/samsung/odroid.c +++ b/sound/soc/samsung/odroid.c @@ -257,27 +257,31 @@ static int odroid_audio_probe(struct platform_device *pdev) ret = of_parse_phandle_with_args(cpu, "sound-dai", "#sound-dai-cells", i, ); if (ret < 0) - return ret; + break; if (!args.np) { dev_err(dev, "sound-dai property parse error: %d\n", ret); - return -EINVAL; + ret = -EINVAL; + break; } ret = snd_soc_get_dai_name(, >cpu_dai_name); of_node_put(args.np); if (ret < 0) - return ret; + break; } + if (ret == 0) + cpu_dai = of_parse_phandle(cpu, "sound-dai", 0); - cpu_dai = of_parse_phandle(cpu, "sound-dai", 0); of_node_put(cpu); of_node_put(codec); + if (ret < 0) + return ret; ret = snd_soc_of_get_dai_link_codecs(dev, codec, codec_link); if (ret < 0) - goto err_put_codec_n; + goto err_put_cpu_dai; /* Set capture capability only for boards with the MAX98090 CODEC */ if (codec_link->num_codecs > 1) { @@ -288,7 +292,7 @@ static int odroid_audio_probe(struct platform_device *pdev) priv->sclk_i2s = of_clk_get_by_name(cpu_dai, "i2s_opclk1"); if (IS_ERR(priv->sclk_i2s)) { ret = PTR_ERR(priv->sclk_i2s); - goto err_put_codec_n; + goto err_put_cpu_dai; } priv->clk_i2s_bus = of_clk_get_by_name(cpu_dai, "iis"); @@ -310,7 +314,8 @@ static int odroid_audio_probe(struct platform_device *pdev) clk_put(priv->clk_i2s_bus); err_put_sclk: clk_put(priv->sclk_i2s); -err_put_codec_n: +err_put_cpu_dai: + of_node_put(cpu_dai); snd_soc_of_put_dai_link_codecs(codec_link); return ret; } -- 2.20.1
Re: [PATCH v4 2/2] media: cedrus: Add H264 decoding support
Hi! I really wanted to do another review on previous series but got distracted by analyzing one particulary troublesome H264 sample. It still doesn't work correctly, so I would ask you if you can test it with your stack (it might be userspace issue): http://jernej.libreelec.tv/videos/problematic/test.mkv Please take a look at my comments below. Dne sreda, 20. februar 2019 ob 15:17:34 CET je Maxime Ripard napisal(a): > Introduce some basic H264 decoding support in cedrus. So far, only the > baseline profile videos have been tested, and some more advanced features > used in higher profiles are not even implemented. What is not yet implemented? Multi slice frame decoding, interlaced frames and decoding frames with width > 2048. Anything else? > > Signed-off-by: Maxime Ripard > --- > drivers/staging/media/sunxi/cedrus/Makefile | 3 +- > drivers/staging/media/sunxi/cedrus/cedrus.c | 30 +- > drivers/staging/media/sunxi/cedrus/cedrus.h | 38 +- > drivers/staging/media/sunxi/cedrus/cedrus_dec.c | 13 +- > drivers/staging/media/sunxi/cedrus/cedrus_h264.c | 584 +++- > drivers/staging/media/sunxi/cedrus/cedrus_hw.c| 4 +- > drivers/staging/media/sunxi/cedrus/cedrus_regs.h | 91 ++- > drivers/staging/media/sunxi/cedrus/cedrus_video.c | 9 +- > 8 files changed, 770 insertions(+), 2 deletions(-) > create mode 100644 drivers/staging/media/sunxi/cedrus/cedrus_h264.c > > diff --git a/drivers/staging/media/sunxi/cedrus/Makefile > b/drivers/staging/media/sunxi/cedrus/Makefile index > e9dc68b7bcb6..aaf141fc58b6 100644 > --- a/drivers/staging/media/sunxi/cedrus/Makefile > +++ b/drivers/staging/media/sunxi/cedrus/Makefile > @@ -1,3 +1,4 @@ > obj-$(CONFIG_VIDEO_SUNXI_CEDRUS) += sunxi-cedrus.o > > -sunxi-cedrus-y = cedrus.o cedrus_video.o cedrus_hw.o cedrus_dec.o > cedrus_mpeg2.o +sunxi-cedrus-y = cedrus.o cedrus_video.o cedrus_hw.o > cedrus_dec.o \ + cedrus_mpeg2.o cedrus_h264.o > diff --git a/drivers/staging/media/sunxi/cedrus/cedrus.c > b/drivers/staging/media/sunxi/cedrus/cedrus.c index > ff11cbeba205..c1607142d998 100644 > --- a/drivers/staging/media/sunxi/cedrus/cedrus.c > +++ b/drivers/staging/media/sunxi/cedrus/cedrus.c > @@ -40,6 +40,35 @@ static const struct cedrus_control cedrus_controls[] = { > .codec = CEDRUS_CODEC_MPEG2, > .required = false, > }, > + { > + .id = V4L2_CID_MPEG_VIDEO_H264_DECODE_PARAMS, > + .elem_size = sizeof(struct v4l2_ctrl_h264_decode_param), > + .codec = CEDRUS_CODEC_H264, > + .required = true, > + }, > + { > + .id = V4L2_CID_MPEG_VIDEO_H264_SLICE_PARAMS, > + .elem_size = sizeof(struct v4l2_ctrl_h264_slice_param), > + .codec = CEDRUS_CODEC_H264, > + .required = true, > + }, > + { > + .id = V4L2_CID_MPEG_VIDEO_H264_SPS, > + .elem_size = sizeof(struct v4l2_ctrl_h264_sps), > + .codec = CEDRUS_CODEC_H264, > + .required = true, > + }, > + { > + .id = V4L2_CID_MPEG_VIDEO_H264_PPS, > + .elem_size = sizeof(struct v4l2_ctrl_h264_pps), > + .codec = CEDRUS_CODEC_H264, > + .required = true, > + }, > + { > + .id = V4L2_CID_MPEG_VIDEO_H264_SCALING_MATRIX, > + .elem_size = sizeof(struct v4l2_ctrl_h264_scaling_matrix), > + .codec = CEDRUS_CODEC_H264, > + }, > }; > > #define CEDRUS_CONTROLS_COUNTARRAY_SIZE(cedrus_controls) > @@ -278,6 +307,7 @@ static int cedrus_probe(struct platform_device *pdev) > } > > dev->dec_ops[CEDRUS_CODEC_MPEG2] = _dec_ops_mpeg2; > + dev->dec_ops[CEDRUS_CODEC_H264] = _dec_ops_h264; > > mutex_init(>dev_mutex); > > diff --git a/drivers/staging/media/sunxi/cedrus/cedrus.h > b/drivers/staging/media/sunxi/cedrus/cedrus.h index > 4aedd24a9848..8c64f9a27e9d 100644 > --- a/drivers/staging/media/sunxi/cedrus/cedrus.h > +++ b/drivers/staging/media/sunxi/cedrus/cedrus.h > @@ -30,7 +30,7 @@ > > enum cedrus_codec { > CEDRUS_CODEC_MPEG2, > - > + CEDRUS_CODEC_H264, > CEDRUS_CODEC_LAST, > }; > > @@ -40,6 +40,12 @@ enum cedrus_irq_status { > CEDRUS_IRQ_OK, > }; > > +enum cedrus_h264_pic_type { > + CEDRUS_H264_PIC_TYPE_FRAME = 0, > + CEDRUS_H264_PIC_TYPE_FIELD, > + CEDRUS_H264_PIC_TYPE_MBAFF, > +}; > + > struct cedrus_control { > u32 id; > u32 elem_size; > @@ -47,6 +53,14 @@ struct cedrus_control { > unsigned char required:1; > }; > > +struct cedrus_h264_run { > + const struct v4l2_ctrl_h264_decode_param*decode_param; > + const struct v4l2_ctrl_h264_pps *pps; > + const
Re: [GIT PULL] mtd: Fixes for 5.0/5.0-rc8
The pull request you sent on Wed, 20 Feb 2019 08:56:53 +0100: > git://git.infradead.org/linux-mtd.git tags/mtd/fixes-for-5.0-rc8 has been merged into torvalds/linux.git: https://git.kernel.org/torvalds/c/7d9d592caf8cc5d91f7923c5e717b69d0b1e246f Thank you! -- Deet-doot-dot, I am a bot. https://korg.wiki.kernel.org/userdoc/prtracker
Re: [GIT PULL] sound fixes for 5.0
The pull request you sent on Wed, 20 Feb 2019 11:42:46 +0100: > git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound.git tags/sound-5.0 has been merged into torvalds/linux.git: https://git.kernel.org/torvalds/c/2137397c92aec3713fa10be3c9b830f9a1674e60 Thank you! -- Deet-doot-dot, I am a bot. https://korg.wiki.kernel.org/userdoc/prtracker
Re: [GIT PULL] GPIO fixes for the v5.0 series
The pull request you sent on Wed, 20 Feb 2019 09:03:22 +0100: > git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-gpio.git > tags/gpio-v5.0-4 has been merged into torvalds/linux.git: https://git.kernel.org/torvalds/c/c828c2651b9a8184e1414fa0611d18b84d3847dd Thank you! -- Deet-doot-dot, I am a bot. https://korg.wiki.kernel.org/userdoc/prtracker
Re: [GIT PULL] pin control fixes for v5.0
The pull request you sent on Wed, 20 Feb 2019 09:08:25 +0100: > git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-pinctrl.git > tags/pinctrl-v5.0-3 has been merged into torvalds/linux.git: https://git.kernel.org/torvalds/c/fb83f15ef9dd984834bc60b380efbeffdf1ecc04 Thank you! -- Deet-doot-dot, I am a bot. https://korg.wiki.kernel.org/userdoc/prtracker
Re: [PATCH v6 4/6] powerpc/32: Add KASAN support
Le 19/02/2019 à 18:23, Christophe Leroy a écrit : This patch adds KASAN support for PPC32. The KASAN shadow area is located between the vmalloc area and the fixmap area. KASAN_SHADOW_OFFSET is calculated in asm/kasan.h and extracted by Makefile prepare rule via asm-offsets.h For modules, the shadow area is allocated at module_alloc(). Note that on book3s it will only work on the 603 because the other ones use hash table and can therefore not share a single PTE table covering the entire early KASAN shadow area. Signed-off-by: Christophe Leroy --- arch/powerpc/Kconfig | 1 + arch/powerpc/Makefile | 7 ++ arch/powerpc/include/asm/book3s/32/pgtable.h | 2 + arch/powerpc/include/asm/highmem.h| 10 +- arch/powerpc/include/asm/kasan.h | 23 arch/powerpc/include/asm/nohash/32/pgtable.h | 2 + arch/powerpc/include/asm/setup.h | 5 + arch/powerpc/kernel/Makefile | 9 +- arch/powerpc/kernel/asm-offsets.c | 4 + arch/powerpc/kernel/head_32.S | 3 + arch/powerpc/kernel/head_40x.S| 3 + arch/powerpc/kernel/head_44x.S| 3 + arch/powerpc/kernel/head_8xx.S| 3 + arch/powerpc/kernel/head_fsl_booke.S | 3 + arch/powerpc/kernel/setup-common.c| 2 + arch/powerpc/lib/Makefile | 8 ++ arch/powerpc/mm/Makefile | 1 + arch/powerpc/mm/kasan/Makefile| 5 + arch/powerpc/mm/kasan/kasan_init_32.c | 147 ++ arch/powerpc/mm/mem.c | 4 + arch/powerpc/mm/ptdump/dump_linuxpagetables.c | 8 ++ @Daniel (and others), note that to apply properly, this requires my other patch which moves the dumping files in a arch/powerpc/mm/ptdump/ subdir. Christophe arch/powerpc/purgatory/Makefile | 3 + arch/powerpc/xmon/Makefile| 1 + 23 files changed, 253 insertions(+), 4 deletions(-) create mode 100644 arch/powerpc/mm/kasan/Makefile create mode 100644 arch/powerpc/mm/kasan/kasan_init_32.c diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig index 08908219fba9..850b06def84f 100644 --- a/arch/powerpc/Kconfig +++ b/arch/powerpc/Kconfig @@ -175,6 +175,7 @@ config PPC select GENERIC_TIME_VSYSCALL select HAVE_ARCH_AUDITSYSCALL select HAVE_ARCH_JUMP_LABEL + select HAVE_ARCH_KASAN if PPC32 select HAVE_ARCH_KGDB select HAVE_ARCH_MMAP_RND_BITS select HAVE_ARCH_MMAP_RND_COMPAT_BITS if COMPAT diff --git a/arch/powerpc/Makefile b/arch/powerpc/Makefile index ac033341ed55..f0738099e31e 100644 --- a/arch/powerpc/Makefile +++ b/arch/powerpc/Makefile @@ -427,6 +427,13 @@ else endif endif +ifdef CONFIG_KASAN +prepare: kasan_prepare + +kasan_prepare: prepare0 + $(eval KASAN_SHADOW_OFFSET = $(shell awk '{if ($$2 == "KASAN_SHADOW_OFFSET") print $$3;}' include/generated/asm-offsets.h)) +endif + # Check toolchain versions: # - gcc-4.6 is the minimum kernel-wide version so nothing required. checkbin: diff --git a/arch/powerpc/include/asm/book3s/32/pgtable.h b/arch/powerpc/include/asm/book3s/32/pgtable.h index 49d76adb9bc5..4543016f80ca 100644 --- a/arch/powerpc/include/asm/book3s/32/pgtable.h +++ b/arch/powerpc/include/asm/book3s/32/pgtable.h @@ -141,6 +141,8 @@ static inline bool pte_user(pte_t pte) */ #ifdef CONFIG_HIGHMEM #define KVIRT_TOP PKMAP_BASE +#elif defined(CONFIG_KASAN) +#define KVIRT_TOP KASAN_SHADOW_START #else #define KVIRT_TOP (0xfe00UL) /* for now, could be FIXMAP_BASE ? */ #endif diff --git a/arch/powerpc/include/asm/highmem.h b/arch/powerpc/include/asm/highmem.h index a4b65b186ec6..483b90025bef 100644 --- a/arch/powerpc/include/asm/highmem.h +++ b/arch/powerpc/include/asm/highmem.h @@ -28,6 +28,7 @@ #include #include #include +#include extern pte_t *kmap_pte; extern pgprot_t kmap_prot; @@ -50,10 +51,15 @@ extern pte_t *pkmap_page_table; #define PKMAP_ORDER 9 #endif #define LAST_PKMAP(1 << PKMAP_ORDER) +#ifdef CONFIG_KASAN +#define PKMAP_TOP KASAN_SHADOW_START +#else +#define PKMAP_TOP FIXADDR_START +#endif #ifndef CONFIG_PPC_4K_PAGES -#define PKMAP_BASE (FIXADDR_START - PAGE_SIZE*(LAST_PKMAP + 1)) +#define PKMAP_BASE (PKMAP_TOP - PAGE_SIZE*(LAST_PKMAP + 1)) #else -#define PKMAP_BASE ((FIXADDR_START - PAGE_SIZE*(LAST_PKMAP + 1)) & PMD_MASK) +#define PKMAP_BASE ((PKMAP_TOP - PAGE_SIZE*(LAST_PKMAP + 1)) & PMD_MASK) #endif #define LAST_PKMAP_MASK (LAST_PKMAP-1) #define PKMAP_NR(virt) ((virt-PKMAP_BASE) >> PAGE_SHIFT) diff --git a/arch/powerpc/include/asm/kasan.h b/arch/powerpc/include/asm/kasan.h index 2efd0e42cfc9..0bc9148f5d87 100644 --- a/arch/powerpc/include/asm/kasan.h +++ b/arch/powerpc/include/asm/kasan.h @@ -12,4
Re: [PATCH] RDMA/mlx4: Spread completion vectors for proxy CQs
> On 20 Feb 2019, at 18:14, Jason Gunthorpe wrote: > > On Tue, Feb 19, 2019 at 06:32:50PM +0100, HÃ¥kon Bugge wrote: >> Anyway, Jason mentioned in a private email that maybe we could use the >> new completion API or something? I am not familiar with that one >> (yet). > > I was thinking of the stuff in core/cq.c - but it also doesn't have > automatic comp_vector balancing. It is the logical place to put > something like that though.. > > An API to manage a bundle of CPU affine CQ's is probably what most > ULPs really need.. (it makes little sense to create a unique CQ for > every QP) ULPs behave way differently. E.g. RDS creates one tx and one rx CQ per QP. As I wrote earlier, we do not have any modify_cq() that changes the comp_vector (EQ association). We can balance #CQ associated with the EQs, but we do not know their behaviour. So, assume 2 completion EQs, and four CQs. CQa and CQb are associated with the first EQ, the two others with the second EQ. That's the "best" we can do. But, if CQa and CQb are the only ones generating events, we will have all interrupt processing on a single CPU. But if we now could modify CQa.comp_vector to be that of the second EQ, we could achieve balance. But not sure if the drivers are able to do this at all. > alloc_bundle() You mean alloc a bunch of CQs? How do you know their #cqes and cq_context? HÃ¥kon > get_cqn_for_flow(bundle) > alloc_qp() > destroy_qp() > put_cqn_for_flow(bundle) > destroy_bundle(); > > Let the core code balance the cqn's and allocate (shared) CQ > resources. > > Jason
Re: xarray reserve/release?
On Wed, Feb 20, 2019 at 09:14:14AM -0800, Matthew Wilcox wrote: > > void __xa_release(struct xarray *xa, unsigned long index) > > { > > XA_STATE(xas, xa, index); > > void *curr; > > > > curr = xas_load(); > > if (curr == XA_ZERO_ENTRY) > > xas_store(, NULL); > > } > > > > ? > > I decided to instead remove the magic from xa_cmpxchg(). I used > to prohibit any internal entry being passed to the regular API, but > I recently changed that with 76b4e5299565 ("XArray: Permit storing > 2-byte-aligned pointers"). Now that we can pass XA_ZERO_ENTRY, I > think this all makes much more sense. Except that for allocating arrays xa_cmpxchg and xa_store now do different things with NULL. Not necessarily bad, but if you have this ABI variation it should be mentioned in the kdoc comment. This is a bit worrysome though: curr = xas_load(); - if (curr == XA_ZERO_ENTRY) - curr = NULL; if (curr == old) { It means any cmpxchg user has to care explicitly about the possibility for true-NULL vs reserved. Seems like a difficult API. What about writing it like this: if ((curr == XA_ZERO_ENTRY && old == NULL) || curr == old) ? I can't think of a use case to cmpxchg against real-null only. And here: xas_store(, entry); - if (xa_track_free(xa)) + if (xa_track_free(xa) && !old) xas_clear_mark(, XA_FREE_MARK); Should this be if (xa_track_free(xa) && entry && !old) ? Ie we don't want to clear the XA_FREE_MARK if we just wrote NULL Also I would think !curr is clearer? I assume the point is to not pay the price of xas_clear_mark if we already know the index stored is marked? > > Also, I wonder if xa_reserve() is better written as as > > > >xa_cmpxchg(xa, index, NULL, XA_ZERO_ENTRY) > > > > Bit clearer what is going on.. > > Yes, I agree. I've pushed a couple of new commits to > http://git.infradead.org/users/willy/linux-dax.git/shortlog/refs/heads/xarray That looks really readable now that reserve and release are tidy paired operations. Thanks, Jason
Re: [PATCH 0/7] libnvdimm/pfn: Fix section-alignment padding
Dan Williams writes: > On Tue, Feb 12, 2019 at 1:37 PM Dan Williams wrote: >> >> Lately Linux has encountered platforms that collide Persistent Memory >> regions between each other, specifically cases where ->start_pad needed >> to be non-zero. This lead to commit ae86cbfef381 "libnvdimm, pfn: Pad >> pfn namespaces relative to other regions". That commit allowed >> namespaces to be mapped with devm_memremap_pages(). However dax >> operations on those configurations currently fail if attempted within the >> ->start_pad range because pmem_device->data_offset was still relative to >> raw resource base not relative to the section aligned resource range >> mapped by devm_memremap_pages(). >> >> Luckily __bdev_dax_supported() caught these failures and simply disabled >> dax. However, to fix this situation a non-backwards compatible change >> needs to be made to the interpretation of the nd_pfn info-block. >> ->start_pad needs to be accounted in ->map.map_offset (formerly >> ->data_offset), and ->map.map_base (formerly ->phys_addr) needs to be >> adjusted to the section aligned resource base used to establish >> ->map.map formerly (formerly ->virt_addr). >> >> See patch 7 "libnvdimm/pfn: Fix 'start_pad' implementation" for more >> details, and the ndctl patch series "Improve support + testing for >> labels + info-blocks" for the corresponding regression test. > > Hello valued reviewers, can I plead for a sanity check of at least > "libnvdimm/pfn: Introduce super-block minimum version requirements" > and "libnvdimm/pfn: Fix 'start_pad' implementation"? In particular > Jeff / Johannes this has end user / distro impact in that users may > lose access to namespaces that are upgraded to v1.3 info-blocks and > then boot an old kernel. I did not see a way around that sharp edge. Yes, I'll take a look. Cheers, Jeff
Re: [PATCH 02/10] powerpc/603: Store PGDIR physical address in a SPRG
Le 25/01/2019 à 13:34, Christophe Leroy a écrit : Use SPRN_SPRG5 to store the current thread PGDIR and avoid reading thread_struct->pgdir at every TLB miss. I'll send out v2 with an additional patch getting rid of SPRN_SPRG_RTAS hence freeing SPRN_SPRG2 which I will use here instead of SPRN_SPRG5 so that all 6xx will benefit. Christophe Signed-off-by: Christophe Leroy --- arch/powerpc/include/asm/reg.h | 1 + arch/powerpc/kernel/cpu_setup_6xx.S | 4 arch/powerpc/kernel/head_32.S | 28 3 files changed, 21 insertions(+), 12 deletions(-) diff --git a/arch/powerpc/include/asm/reg.h b/arch/powerpc/include/asm/reg.h index 1c98ef1f2d5b..ba0ab1a1431b 100644 --- a/arch/powerpc/include/asm/reg.h +++ b/arch/powerpc/include/asm/reg.h @@ -1169,6 +1169,7 @@ #define SPRN_SPRG_SCRATCH1SPRN_SPRG1 #define SPRN_SPRG_RTASSPRN_SPRG2 #define SPRN_SPRG_603_LRU SPRN_SPRG4 +#define SPRN_SPRG_603_PGDIRSPRN_SPRG5 #endif #ifdef CONFIG_40x diff --git a/arch/powerpc/kernel/cpu_setup_6xx.S b/arch/powerpc/kernel/cpu_setup_6xx.S index 8c069e96c478..4c91d1f640fe 100644 --- a/arch/powerpc/kernel/cpu_setup_6xx.S +++ b/arch/powerpc/kernel/cpu_setup_6xx.S @@ -24,6 +24,10 @@ BEGIN_MMU_FTR_SECTION li r10,0 mtspr SPRN_SPRG_603_LRU,r10 /* init SW LRU tracking */ END_MMU_FTR_SECTION_IFSET(MMU_FTR_NEED_DTLB_SW_LRU) + lis r10, (swapper_pg_dir - PAGE_OFFSET)@h + ori r10, r10, (swapper_pg_dir - PAGE_OFFSET)@l + mtspr SPRN_SPRG_603_PGDIR, r10 + BEGIN_FTR_SECTION bl __init_fpu_registers END_FTR_SECTION_IFCLR(CPU_FTR_FPU_UNAVAILABLE) diff --git a/arch/powerpc/kernel/head_32.S b/arch/powerpc/kernel/head_32.S index c2f564690778..dbd15e03952a 100644 --- a/arch/powerpc/kernel/head_32.S +++ b/arch/powerpc/kernel/head_32.S @@ -502,16 +502,15 @@ InstructionTLBMiss: mfspr r3,SPRN_IMISS lis r1,PAGE_OFFSET@h/* check if kernel address */ cmplw 0,r1,r3 - mfspr r2,SPRN_SPRG_THREAD + mfspr r2, SPRN_SPRG_603_PGDIR li r1,_PAGE_USER|_PAGE_PRESENT|_PAGE_EXEC /* low addresses tested as user */ - lwz r2,PGDIR(r2) bge-112f mfspr r2,SPRN_SRR1/* and MSR_PR bit from SRR1 */ rlwimi r1,r2,32-12,29,29 /* shift MSR_PR to _PAGE_USER posn */ lis r2,swapper_pg_dir@ha/* if kernel address, use */ addir2,r2,swapper_pg_dir@l /* kernel page table */ -112: tophys(r2,r2) - rlwimi r2,r3,12,20,29 /* insert top 10 bits of address */ + tophys(r2,r2) +112: rlwimi r2,r3,12,20,29 /* insert top 10 bits of address */ lwz r2,0(r2)/* get pmd entry */ rlwinm. r2,r2,0,0,19/* extract address of pte page */ beq-InstructionAddressInvalid /* return if no mapping */ @@ -576,16 +575,15 @@ DataLoadTLBMiss: mfspr r3,SPRN_DMISS lis r1,PAGE_OFFSET@h/* check if kernel address */ cmplw 0,r1,r3 - mfspr r2,SPRN_SPRG_THREAD + mfspr r2, SPRN_SPRG_603_PGDIR li r1,_PAGE_USER|_PAGE_PRESENT /* low addresses tested as user */ - lwz r2,PGDIR(r2) bge-112f mfspr r2,SPRN_SRR1/* and MSR_PR bit from SRR1 */ rlwimi r1,r2,32-12,29,29 /* shift MSR_PR to _PAGE_USER posn */ lis r2,swapper_pg_dir@ha/* if kernel address, use */ addir2,r2,swapper_pg_dir@l /* kernel page table */ -112: tophys(r2,r2) - rlwimi r2,r3,12,20,29 /* insert top 10 bits of address */ + tophys(r2,r2) +112: rlwimi r2,r3,12,20,29 /* insert top 10 bits of address */ lwz r2,0(r2)/* get pmd entry */ rlwinm. r2,r2,0,0,19/* extract address of pte page */ beq-DataAddressInvalid /* return if no mapping */ @@ -660,16 +658,15 @@ DataStoreTLBMiss: mfspr r3,SPRN_DMISS lis r1,PAGE_OFFSET@h/* check if kernel address */ cmplw 0,r1,r3 - mfspr r2,SPRN_SPRG_THREAD + mfspr r2, SPRN_SPRG_603_PGDIR li r1,_PAGE_RW|_PAGE_USER|_PAGE_PRESENT /* access flags */ - lwz r2,PGDIR(r2) bge-112f mfspr r2,SPRN_SRR1/* and MSR_PR bit from SRR1 */ rlwimi r1,r2,32-12,29,29 /* shift MSR_PR to _PAGE_USER posn */ lis r2,swapper_pg_dir@ha/* if kernel address, use */ addir2,r2,swapper_pg_dir@l /* kernel page table */ -112: tophys(r2,r2) - rlwimi r2,r3,12,20,29 /* insert top 10 bits of address */ + tophys(r2,r2) +112: rlwimi r2,r3,12,20,29 /* insert top 10 bits of address */ lwz r2,0(r2)/* get pmd entry */ rlwinm. r2,r2,0,0,19/* extract address of pte
Re: [PATCH v15 15/15] tracing: Add hist trigger action 'expected fail' test case
Hi Steve, On Wed, 2019-02-20 at 12:17 -0500, Steven Rostedt wrote: > On Wed, 13 Feb 2019 17:42:55 -0600 > Tom Zanussi wrote: > > > From: Tom Zanussi > > > > Add a test case verifying that basic action combinations fail as > > expected. > > > > Hi Tom, > > This test appears to fail: > > # echo 'hist:keys=comm:onmatch(sched.sched_wakeup).save(comm,prio)' > >> /sys/kernel/debug/tracing/events/sched/sched_waking/trigger > -bash: echo: write error: Invalid argument > > # cat /sys/kernel/debug/tracing/events/sched/sched_waking/hist > > ERROR: action parsing: Handler doesn't support action: save > Last command: keys=comm:onmatch(sched.sched_wakeup).save(comm,prio) > > > Is the "save" feature implemented here? It's in the README too. > Should > it be removed? > The "save" feature is implemented, but it's not currently supported with onmatch(), which is why it fails, and is used in the xfail test, since it's expected to. So, in this case, the command fails, which means the xfail test actually passed. ;-) There are other tests in the inter-event testcases that use save() but with onmax() and onchange(), and they pass. Hope that explains things in this case, Tom > -- Steve > > > Signed-off-by: Tom Zanussi > > --- > > .../inter-event/trigger-action-hist-xfail.tc | 30 > > ++ > > 1 file changed, 30 insertions(+) > > create mode 100644 > > tools/testing/selftests/ftrace/test.d/trigger/inter-event/trigger- > > action-hist-xfail.tc > > > > diff --git a/tools/testing/selftests/ftrace/test.d/trigger/inter- > > event/trigger-action-hist-xfail.tc > > b/tools/testing/selftests/ftrace/test.d/trigger/inter- > > event/trigger-action-hist-xfail.tc > > new file mode 100644 > > index ..1221240f8cf6 > > --- /dev/null > > +++ b/tools/testing/selftests/ftrace/test.d/trigger/inter- > > event/trigger-action-hist-xfail.tc > > @@ -0,0 +1,30 @@ > > +#!/bin/sh > > +# SPDX-License-Identifier: GPL-2.0 > > +# description: event trigger - test inter-event histogram trigger > > expected fail actions > > + > > +fail() { #msg > > +echo $1 > > +exit_fail > > +} > > + > > +if [ ! -f set_event ]; then > > +echo "event tracing is not supported" > > +exit_unsupported > > +fi > > + > > +if [ ! -f snapshot ]; then > > +echo "snapshot is not supported" > > +exit_unsupported > > +fi > > + > > +grep -q "snapshot()" README || exit_unsupported # version issue > > + > > +echo "Test expected snapshot action failure" > > + > > +echo 'hist:keys=comm:onmatch(sched.sched_wakeup).snapshot()' >> > > /sys/kernel/debug/tracing/events/sched/sched_waking/trigger && > > exit_fail > > + > > +echo "Test expected save action failure" > > + > > +echo 'hist:keys=comm:onmatch(sched.sched_wakeup).save(comm,prio)' > > >> /sys/kernel/debug/tracing/events/sched/sched_waking/trigger && > > exit_fail > > + > > +exit_xfail > >
Re: [PATCH] kasan: turn off asan-stack for clang-8 and earlier
On Wed, Feb 20, 2019 at 6:00 PM Andrey Ryabinin wrote: > On 2/20/19 5:51 PM, Arnd Bergmann wrote: > > On Wed, Feb 20, 2019 at 3:45 PM Andrey Konovalov > > wrote: > > I would have to some more research, but I expect several hundred > > patches before we get to a clean randconfig build with a broken > > compiler. > > Manually maintaining asan-stack parameter for the sake of one broken compiler > isn't a great idea either. > > Couple alternative suggestions: > > 1) If we can't fix the problem or the cost of fixing is too high, maybe just > hide it? Disable -Wframe-larger-then on pre clang-9 compilers. > > 2) Fallback cflags. The idea is to try to compile every the file with "-mllvm > -asan-stack=1 -Wframe-larger-than=2048 -Werror" at first, > and fallback to "-mllvm -asan-stack=0" if failed. So it would be something > similar to $(call cc-option, -mllvm -asan-stack=1 -Wframe-larger-than=2048 > -Werror, -mllvm -asan-stack=0) > except that "cc-option" tries options only once on some code example while > we need to try options on every file that we actually compile. > Honestly, I'm not sure that it's worthy to hack Kbuild engine for that > particular use-case. My original plan was to put this under CONFIG_KASAN_EXTRA to allow you to still enable it in older compilers, but you just removed that option ;-) Maybe bringing it back would be a compromise? That way it's hidden from all the build testing bots (because of the !CONFIG_COMPILE_TEST dependency), but anyone who really wants it can still have the option, and set CONFIG_FRAME_WARN to whichever value they like. Arnd
[PATCH v3 00/16] powerpc/32: Use BATs/LTLBs for STRICT_KERNEL_RWX
The purpose of this serie is to: - use BATs with STRICT_KERNEL_RWX on book3s (See patch 13 for details.) - use LTLBs with STRICT_KERNEL_RWX on 8xx (See patch 15 for a few details.) v3: - Reordered to avoid build failure due to setibat() not being used for several steps in the serie. Now the patch using setibat() is next to the one adding setibat(). - Fixed mmu_mapin_ram() in patch 3 to return base in all cases, thanks Jonathan for the test - Fixed build failure on 8xx when CONFIG_PERF_EVENTS is set due to too many instructions in Exception 0x1200 - Made 8M alignment for data the default on 8xx when STRICT_KERNEL_RWX is selected. - Added patch 1 to not set additionnal bat on the wii when requesting nobats. The only purpose of this patch is to be backported, as this function is removed later in the series. v2: - Fix patch 2 (was patch 3 in v1) based on feedback from Jonathan. - Added support for 8xx with LTLBs. - Added systematic population of pagetables for Abatron BDI. Christophe Leroy (16): powerpc/wii: properly disable use of BATs when requested. powerpc/mm/32: add base address to mmu_mapin_ram() powerpc/mm/32s: rework mmu_mapin_ram() powerpc/mm/32s: use generic mmu_mapin_ram() for all blocks. powerpc/32: always populate page tables for Abatron BDI. powerpc/wii: remove wii_mmu_mapin_mem2() powerpc/mm/32s: use _PAGE_EXEC in setbat() powerpc/32: add helper to write into segment registers powerpc/mmu: add is_strict_kernel_rwx() helper powerpc/kconfig: define PAGE_SHIFT inside Kconfig powerpc/kconfig: define CONFIG_DATA_SHIFT and CONFIG_ETEXT_SHIFT powerpc/mm/32s: add setibat() clearibat() and update_bats() powerpc/mm/32s: Use BATs for STRICT_KERNEL_RWX powerpc/kconfig: make _etext and data areas alignment configurable on Book3s 32 powerpc/8xx: don't disable large TLBs with CONFIG_STRICT_KERNEL_RWX powerpc/kconfig: make _etext and data areas alignment configurable on 8xx arch/powerpc/Kconfig | 60 + arch/powerpc/include/asm/book3s/32/mmu-hash.h | 2 + arch/powerpc/include/asm/book3s/32/pgtable.h | 11 ++ arch/powerpc/include/asm/mmu.h| 11 ++ arch/powerpc/include/asm/nohash/32/mmu-8xx.h | 3 +- arch/powerpc/include/asm/page.h | 13 +- arch/powerpc/include/asm/reg.h| 5 + arch/powerpc/kernel/head_32.S | 35 + arch/powerpc/kernel/head_8xx.S| 54 ++-- arch/powerpc/kernel/vmlinux.lds.S | 9 +- arch/powerpc/mm/40x_mmu.c | 2 +- arch/powerpc/mm/44x_mmu.c | 2 +- arch/powerpc/mm/8xx_mmu.c | 33 - arch/powerpc/mm/fsl_booke_mmu.c | 2 +- arch/powerpc/mm/init_32.c | 6 +- arch/powerpc/mm/mmu_decl.h| 10 +- arch/powerpc/mm/pgtable_32.c | 38 +++--- arch/powerpc/mm/ppc_mmu_32.c | 180 ++ arch/powerpc/platforms/embedded6xx/wii.c | 24 19 files changed, 390 insertions(+), 110 deletions(-) -- 2.13.3
[PATCH v3 05/16] powerpc/32: always populate page tables for Abatron BDI.
When CONFIG_BDI_SWITCH is set, the page tables have to be populated allthough large TLBs are used, because the BDI switch knows nothing about those large TLBs which are handled directly in TLB miss logic. Signed-off-by: Christophe Leroy --- arch/powerpc/mm/pgtable_32.c | 5 - 1 file changed, 4 insertions(+), 1 deletion(-) diff --git a/arch/powerpc/mm/pgtable_32.c b/arch/powerpc/mm/pgtable_32.c index fd665c32a1f7..94bd7d013557 100644 --- a/arch/powerpc/mm/pgtable_32.c +++ b/arch/powerpc/mm/pgtable_32.c @@ -261,7 +261,10 @@ void __init mapin_ram(void) unsigned long top = base + reg->size; base = mmu_mapin_ram(base, top); - __mapin_ram_chunk(base, top); + if (IS_ENABLED(CONFIG_BDI_SWITCH)) + __mapin_ram_chunk(reg->base, top); + else + __mapin_ram_chunk(base, top); } } -- 2.13.3
[PATCH v3 02/16] powerpc/mm/32: add base address to mmu_mapin_ram()
At the time being, mmu_mapin_ram() always maps RAM from the beginning. But some platforms like the WII have to map a second block of RAM. This patch adds to mmu_mapin_ram() the base address of the block. At the moment, only base address 0 is supported. Signed-off-by: Christophe Leroy --- arch/powerpc/mm/40x_mmu.c | 2 +- arch/powerpc/mm/44x_mmu.c | 2 +- arch/powerpc/mm/8xx_mmu.c | 2 +- arch/powerpc/mm/fsl_booke_mmu.c | 2 +- arch/powerpc/mm/mmu_decl.h | 2 +- arch/powerpc/mm/pgtable_32.c| 6 +++--- arch/powerpc/mm/ppc_mmu_32.c| 2 +- 7 files changed, 9 insertions(+), 9 deletions(-) diff --git a/arch/powerpc/mm/40x_mmu.c b/arch/powerpc/mm/40x_mmu.c index 61ac468c87c6..b9cf6f8764b0 100644 --- a/arch/powerpc/mm/40x_mmu.c +++ b/arch/powerpc/mm/40x_mmu.c @@ -93,7 +93,7 @@ void __init MMU_init_hw(void) #define LARGE_PAGE_SIZE_16M(1<<24) #define LARGE_PAGE_SIZE_4M (1<<22) -unsigned long __init mmu_mapin_ram(unsigned long top) +unsigned long __init mmu_mapin_ram(unsigned long base, unsigned long top) { unsigned long v, s, mapped; phys_addr_t p; diff --git a/arch/powerpc/mm/44x_mmu.c b/arch/powerpc/mm/44x_mmu.c index ea2b9af08a48..aad127acdbaa 100644 --- a/arch/powerpc/mm/44x_mmu.c +++ b/arch/powerpc/mm/44x_mmu.c @@ -170,7 +170,7 @@ void __init MMU_init_hw(void) flush_instruction_cache(); } -unsigned long __init mmu_mapin_ram(unsigned long top) +unsigned long __init mmu_mapin_ram(unsigned long base, unsigned long top) { unsigned long addr; unsigned long memstart = memstart_addr & ~(PPC_PIN_SIZE - 1); diff --git a/arch/powerpc/mm/8xx_mmu.c b/arch/powerpc/mm/8xx_mmu.c index e2c32bdb6023..46bc26ef71e9 100644 --- a/arch/powerpc/mm/8xx_mmu.c +++ b/arch/powerpc/mm/8xx_mmu.c @@ -99,7 +99,7 @@ static void __init mmu_patch_cmp_limit(s32 *site, unsigned long mapped) modify_instruction_site(site, 0x, (unsigned long)__va(mapped) >> 16); } -unsigned long __init mmu_mapin_ram(unsigned long top) +unsigned long __init mmu_mapin_ram(unsigned long base, unsigned long top) { unsigned long mapped; diff --git a/arch/powerpc/mm/fsl_booke_mmu.c b/arch/powerpc/mm/fsl_booke_mmu.c index 080d49b26c3a..210cbc1faf63 100644 --- a/arch/powerpc/mm/fsl_booke_mmu.c +++ b/arch/powerpc/mm/fsl_booke_mmu.c @@ -221,7 +221,7 @@ unsigned long map_mem_in_cams(unsigned long ram, int max_cam_idx, bool dryrun) #error "LOWMEM_CAM_NUM must be less than NUM_TLBCAMS" #endif -unsigned long __init mmu_mapin_ram(unsigned long top) +unsigned long __init mmu_mapin_ram(unsigned long base, unsigned long top) { return tlbcam_addrs[tlbcam_index - 1].limit - PAGE_OFFSET + 1; } diff --git a/arch/powerpc/mm/mmu_decl.h b/arch/powerpc/mm/mmu_decl.h index c4a717da65eb..61730023dde3 100644 --- a/arch/powerpc/mm/mmu_decl.h +++ b/arch/powerpc/mm/mmu_decl.h @@ -130,7 +130,7 @@ extern void wii_memory_fixups(void); */ #ifdef CONFIG_PPC32 extern void MMU_init_hw(void); -extern unsigned long mmu_mapin_ram(unsigned long top); +unsigned long mmu_mapin_ram(unsigned long base, unsigned long top); #endif #ifdef CONFIG_PPC_FSL_BOOK3E diff --git a/arch/powerpc/mm/pgtable_32.c b/arch/powerpc/mm/pgtable_32.c index ded71126ce4c..b4858818523f 100644 --- a/arch/powerpc/mm/pgtable_32.c +++ b/arch/powerpc/mm/pgtable_32.c @@ -258,15 +258,15 @@ void __init mapin_ram(void) #ifndef CONFIG_WII top = total_lowmem; - s = mmu_mapin_ram(top); + s = mmu_mapin_ram(0, top); __mapin_ram_chunk(s, top); #else if (!wii_hole_size) { - s = mmu_mapin_ram(total_lowmem); + s = mmu_mapin_ram(0, total_lowmem); __mapin_ram_chunk(s, total_lowmem); } else { top = wii_hole_start; - s = mmu_mapin_ram(top); + s = mmu_mapin_ram(0, top); __mapin_ram_chunk(s, top); top = memblock_end_of_DRAM(); diff --git a/arch/powerpc/mm/ppc_mmu_32.c b/arch/powerpc/mm/ppc_mmu_32.c index 3f4193201ee7..b260ced065b4 100644 --- a/arch/powerpc/mm/ppc_mmu_32.c +++ b/arch/powerpc/mm/ppc_mmu_32.c @@ -73,7 +73,7 @@ unsigned long p_block_mapped(phys_addr_t pa) return 0; } -unsigned long __init mmu_mapin_ram(unsigned long top) +unsigned long __init mmu_mapin_ram(unsigned long base, unsigned long top) { unsigned long tot, bl, done; unsigned long max_size = (256<<20); -- 2.13.3
[PATCH v3 13/16] powerpc/mm/32s: Use BATs for STRICT_KERNEL_RWX
Today, STRICT_KERNEL_RWX is based on the use of regular pages to map kernel pages. On Book3s 32, it has three consequences: - Using pages instead of BAT for mapping kernel linear memory severely impacts performance. - Exec protection is not effective because no-execute cannot be set at page level (except on 603 which doesn't have hash tables) - Write protection is not effective because PP bits do not provide RO mode for kernel-only pages (except on 603 which handles it in software via PAGE_DIRTY) On the 603+, we have: - Independent IBAT and DBAT allowing limitation of exec parts. - NX bit can be set in segment registers to forbit execution on memory mapped by pages. - RO mode on DBATs even for kernel-only blocks. On the 601, there is nothing much we can do other than warn the user about it, because: - BATs are common to instructions and data. - BAT do not provide RO mode for kernel-only blocks. - segment registers don't have the NX bit. In order to use IBAT for exec protection, this patch: - Aligns _etext to BAT block sizes (128kb) - Set NX bit in kernel segment register (Except on vmalloc area when CONFIG_MODULES is selected) - Maps kernel text with IBATs. In order to use DBAT for exec protection, this patch: - Aligns RW DATA to BAT block sizes (4M) - Maps kernel RO area with write prohibited DBATs - Maps remaining memory with remaining DBATs Here is what we get with this patch on a 832x when activating STRICT_KERNEL_RWX: Symbols: c000 T _stext c068 R __start_rodata c068 R _etext c080 T __init_begin c080 T _sinittext ~# cat /sys/kernel/debug/block_address_translation ---[ Instruction Block Address Translation ]--- 0: 0xc000-0xc03f 0x Kernel EXEC coherent 1: 0xc040-0xc05f 0x0040 Kernel EXEC coherent 2: 0xc060-0xc067 0x0060 Kernel EXEC coherent 3: - 4: - 5: - 6: - 7: - ---[ Data Block Address Translation ]--- 0: 0xc000-0xc07f 0x Kernel RO coherent 1: 0xc080-0xc0ff 0x0080 Kernel RW coherent 2: 0xc100-0xc1ff 0x0100 Kernel RW coherent 3: 0xc200-0xc3ff 0x0200 Kernel RW coherent 4: 0xc400-0xc7ff 0x0400 Kernel RW coherent 5: 0xc800-0xcfff 0x0800 Kernel RW coherent 6: 0xd000-0xdfff 0x1000 Kernel RW coherent 7: - ~# cat /sys/kernel/debug/segment_registers ---[ User Segments ]--- 0x-0x0fff Kern key 1 User key 1 VSID 0xa085d0 0x1000-0x1fff Kern key 1 User key 1 VSID 0xa086e1 0x2000-0x2fff Kern key 1 User key 1 VSID 0xa087f2 0x3000-0x3fff Kern key 1 User key 1 VSID 0xa08903 0x4000-0x4fff Kern key 1 User key 1 VSID 0xa08a14 0x5000-0x5fff Kern key 1 User key 1 VSID 0xa08b25 0x6000-0x6fff Kern key 1 User key 1 VSID 0xa08c36 0x7000-0x7fff Kern key 1 User key 1 VSID 0xa08d47 0x8000-0x8fff Kern key 1 User key 1 VSID 0xa08e58 0x9000-0x9fff Kern key 1 User key 1 VSID 0xa08f69 0xa000-0xafff Kern key 1 User key 1 VSID 0xa0907a 0xb000-0xbfff Kern key 1 User key 1 VSID 0xa0918b ---[ Kernel Segments ]--- 0xc000-0xcfff Kern key 0 User key 1 No Exec VSID 0x000ccc 0xd000-0xdfff Kern key 0 User key 1 No Exec VSID 0x000ddd 0xe000-0xefff Kern key 0 User key 1 No Exec VSID 0x000eee 0xf000-0x Kern key 0 User key 1 No Exec VSID 0x000fff Aligning _etext to 128kb allows to map up to 32Mb text with 8 IBATs: 16Mb + 8Mb + 4Mb + 2Mb + 1Mb + 512kb + 256kb + 128kb (+ 128kb) = 32Mb (A 9th IBAT is unneeded as 32Mb would need only a single 32Mb block) Aligning data to 4M allows to map up to 512Mb data with 8 DBATs: 16Mb + 8Mb + 4Mb + 4Mb + 32Mb + 64Mb + 128Mb + 256Mb = 512Mb Because some processors only have 4 BATs and because some targets need DBATs for mapping other areas, the following patch will allow to modify _etext and data alignment. Signed-off-by: Christophe Leroy --- arch/powerpc/Kconfig | 2 + arch/powerpc/include/asm/book3s/32/pgtable.h | 11 arch/powerpc/mm/init_32.c| 4 +- arch/powerpc/mm/mmu_decl.h | 8 +++ arch/powerpc/mm/pgtable_32.c | 10 +++- arch/powerpc/mm/ppc_mmu_32.c | 87 ++-- 6 files changed, 112 insertions(+), 10 deletions(-) diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig index edef40a2b446..640a7cfba9d0 100644 --- a/arch/powerpc/Kconfig +++ b/arch/powerpc/Kconfig @@ -727,11 +727,13 @@ config THREAD_SHIFT config ETEXT_SHIFT int + default 17 if STRICT_KERNEL_RWX && PPC_BOOK3S_32 default PPC_PAGE_SHIFT config DATA_SHIFT int default 24 if STRICT_KERNEL_RWX && PPC64 + default 22 if STRICT_KERNEL_RWX && PPC_BOOK3S_32 default PPC_PAGE_SHIFT config FORCE_MAX_ZONEORDER diff --git a/arch/powerpc/include/asm/book3s/32/pgtable.h b/arch/powerpc/include/asm/book3s/32/pgtable.h index
[PATCH v3 06/16] powerpc/wii: remove wii_mmu_mapin_mem2()
wii_mmu_mapin_mem2() is not used anymore, remove it. Signed-off-by: Christophe Leroy --- arch/powerpc/platforms/embedded6xx/wii.c | 28 1 file changed, 28 deletions(-) diff --git a/arch/powerpc/platforms/embedded6xx/wii.c b/arch/powerpc/platforms/embedded6xx/wii.c index ac4ee88efc80..235fe81aa2b1 100644 --- a/arch/powerpc/platforms/embedded6xx/wii.c +++ b/arch/powerpc/platforms/embedded6xx/wii.c @@ -54,10 +54,6 @@ static void __iomem *hw_ctrl; static void __iomem *hw_gpio; -unsigned long wii_hole_start; -unsigned long wii_hole_size; - - static int __init page_aligned(unsigned long x) { return !(x & (PAGE_SIZE-1)); @@ -69,30 +65,6 @@ void __init wii_memory_fixups(void) BUG_ON(memblock.memory.cnt != 2); BUG_ON(!page_aligned(p[0].base) || !page_aligned(p[1].base)); - - /* determine hole */ - wii_hole_start = ALIGN(p[0].base + p[0].size, PAGE_SIZE); - wii_hole_size = p[1].base - wii_hole_start; -} - -unsigned long __init wii_mmu_mapin_mem2(unsigned long top) -{ - unsigned long delta, size, bl; - unsigned long max_size = (256<<20); - - /* MEM2 64MB@0x1000 */ - delta = wii_hole_start + wii_hole_size; - size = top - delta; - - if (__map_without_bats) - return delta; - - for (bl = 128<<10; bl < max_size; bl <<= 1) { - if (bl * 2 > size) - break; - } - setbat(4, PAGE_OFFSET+delta, delta, bl, PAGE_KERNEL_X); - return delta + bl; } static void __noreturn wii_spin(void) -- 2.13.3
[PATCH v3 03/16] powerpc/mm/32s: rework mmu_mapin_ram()
This patch reworks mmu_mapin_ram() to be more generic and map as much blocks as possible. It now supports blocks not starting at address 0. It scans DBATs array to find free ones instead of forcing the use of BAT2 and BAT3. Signed-off-by: Christophe Leroy --- arch/powerpc/mm/ppc_mmu_32.c | 63 1 file changed, 41 insertions(+), 22 deletions(-) diff --git a/arch/powerpc/mm/ppc_mmu_32.c b/arch/powerpc/mm/ppc_mmu_32.c index b260ced065b4..5fc59b195fef 100644 --- a/arch/powerpc/mm/ppc_mmu_32.c +++ b/arch/powerpc/mm/ppc_mmu_32.c @@ -73,39 +73,58 @@ unsigned long p_block_mapped(phys_addr_t pa) return 0; } +static int find_free_bat(void) +{ + int b; + + if (cpu_has_feature(CPU_FTR_601)) { + for (b = 0; b < 4; b++) { + struct ppc_bat *bat = BATS[b]; + + if (!(bat[0].batl & 0x40)) + return b; + } + } else { + int n = mmu_has_feature(MMU_FTR_USE_HIGH_BATS) ? 8 : 4; + + for (b = 0; b < n; b++) { + struct ppc_bat *bat = BATS[b]; + + if (!(bat[1].batu & 3)) + return b; + } + } + return -1; +} + +static unsigned int block_size(unsigned long base, unsigned long top) +{ + unsigned int max_size = (cpu_has_feature(CPU_FTR_601) ? 8 : 256) << 20; + unsigned int base_shift = (fls(base) - 1) & 31; + unsigned int block_shift = (fls(top - base) - 1) & 31; + + return min3(max_size, 1U << base_shift, 1U << block_shift); +} + unsigned long __init mmu_mapin_ram(unsigned long base, unsigned long top) { - unsigned long tot, bl, done; - unsigned long max_size = (256<<20); + int idx; if (__map_without_bats) { printk(KERN_DEBUG "RAM mapped without BATs\n"); - return 0; + return base; } - /* Set up BAT2 and if necessary BAT3 to cover RAM. */ + while ((idx = find_free_bat()) != -1 && base != top) { + unsigned int size = block_size(base, top); - /* Make sure we don't map a block larger than the - smallest alignment of the physical address. */ - tot = top; - for (bl = 128<<10; bl < max_size; bl <<= 1) { - if (bl * 2 > tot) + if (size < 128 << 10) break; + setbat(idx, PAGE_OFFSET + base, base, size, PAGE_KERNEL_X); + base += size; } - setbat(2, PAGE_OFFSET, 0, bl, PAGE_KERNEL_X); - done = (unsigned long)bat_addrs[2].limit - PAGE_OFFSET + 1; - if ((done < tot) && !bat_addrs[3].limit) { - /* use BAT3 to cover a bit more */ - tot -= done; - for (bl = 128<<10; bl < max_size; bl <<= 1) - if (bl * 2 > tot) - break; - setbat(3, PAGE_OFFSET+done, done, bl, PAGE_KERNEL_X); - done = (unsigned long)bat_addrs[3].limit - PAGE_OFFSET + 1; - } - - return done; + return base; } /* -- 2.13.3
[PATCH v3 09/16] powerpc/mmu: add is_strict_kernel_rwx() helper
Add a helper to know whether STRICT_KERNEL_RWX is enabled. This is based on rodata_enabled flag which is defined only when CONFIG_STRICT_KERNEL_RWX is selected. Signed-off-by: Christophe Leroy --- arch/powerpc/include/asm/mmu.h | 11 +++ arch/powerpc/mm/init_32.c | 4 +--- 2 files changed, 12 insertions(+), 3 deletions(-) diff --git a/arch/powerpc/include/asm/mmu.h b/arch/powerpc/include/asm/mmu.h index 6d22a8e78fe2..d34ad1657d7b 100644 --- a/arch/powerpc/include/asm/mmu.h +++ b/arch/powerpc/include/asm/mmu.h @@ -289,6 +289,17 @@ static inline u16 get_mm_addr_key(struct mm_struct *mm, unsigned long address) } #endif /* CONFIG_PPC_MEM_KEYS */ +#ifdef CONFIG_STRICT_KERNEL_RWX +static inline bool strict_kernel_rwx_enabled(void) +{ + return rodata_enabled; +} +#else +static inline bool strict_kernel_rwx_enabled(void) +{ + return false; +} +#endif #endif /* !__ASSEMBLY__ */ /* The kernel use the constants below to index in the page sizes array. diff --git a/arch/powerpc/mm/init_32.c b/arch/powerpc/mm/init_32.c index 3e59e5d64b01..ee5a430b9a18 100644 --- a/arch/powerpc/mm/init_32.c +++ b/arch/powerpc/mm/init_32.c @@ -108,12 +108,10 @@ static void __init MMU_setup(void) __map_without_bats = 1; __map_without_ltlbs = 1; } -#ifdef CONFIG_STRICT_KERNEL_RWX - if (rodata_enabled) { + if (strict_kernel_rwx_enabled()) { __map_without_bats = 1; __map_without_ltlbs = 1; } -#endif } /* -- 2.13.3
[PATCH v3 14/16] powerpc/kconfig: make _etext and data areas alignment configurable on Book3s 32
Depending on the number of available BATs for mapping the different kernel areas, it might be needed to increase the alignment of _etext and/or of data areas. This patchs allows the user to do it via Kconfig. Signed-off-by: Christophe Leroy --- arch/powerpc/Kconfig | 32 ++-- 1 file changed, 30 insertions(+), 2 deletions(-) diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig index 640a7cfba9d0..20c4e3a62b90 100644 --- a/arch/powerpc/Kconfig +++ b/arch/powerpc/Kconfig @@ -725,16 +725,44 @@ config THREAD_SHIFT Used to define the stack size. The default is almost always what you want. Only change this if you know what you are doing. +config ETEXT_SHIFT_BOOL + bool "Set custom etext alignment" if STRICT_KERNEL_RWX && PPC_BOOK3S_32 + depends on ADVANCED_OPTIONS + help + This option allows you to set the kernel end of text alignment. When + RAM is mapped by blocks, the alignment needs to fit the size and + number of possible blocks. The default should be OK for most configs. + + Say N here unless you know what you are doing. + config ETEXT_SHIFT - int + int "_etext shift" if ETEXT_SHIFT_BOOL + range 17 28 if STRICT_KERNEL_RWX && PPC_BOOK3S_32 default 17 if STRICT_KERNEL_RWX && PPC_BOOK3S_32 default PPC_PAGE_SHIFT + help + On Book3S 32 (603+), IBATs are used to map kernel text. + Smaller is the alignment, greater is the number of necessary IBATs. + +config DATA_SHIFT_BOOL + bool "Set custom data alignment" if STRICT_KERNEL_RWX && PPC_BOOK3S_32 + depends on ADVANCED_OPTIONS + help + This option allows you to set the kernel data alignment. When + RAM is mapped by blocks, the alignment needs to fit the size and + number of possible blocks. The default should be OK for most configs. + + Say N here unless you know what you are doing. config DATA_SHIFT - int + int "Data shift" if DATA_SHIFT_BOOL default 24 if STRICT_KERNEL_RWX && PPC64 + range 17 28 if STRICT_KERNEL_RWX && PPC_BOOK3S_32 default 22 if STRICT_KERNEL_RWX && PPC_BOOK3S_32 default PPC_PAGE_SHIFT + help + On Book3S 32 (603+), DBATs are used to map kernel text and rodata RO. + Smaller is the alignment, greater is the number of necessary DBATs. config FORCE_MAX_ZONEORDER int "Maximum zone order" -- 2.13.3
[PATCH v3 11/16] powerpc/kconfig: define CONFIG_DATA_SHIFT and CONFIG_ETEXT_SHIFT
CONFIG_STRICT_KERNEL_RWX requires a special alignment for DATA for some subarches. Today it is just defined as an #ifdef in vmlinux.lds.S In order to get more flexibility, this patch moves the definition of this alignment in Kconfig On some subarches, CONFIG_STRICT_KERNEL_RWX will require a special alignment of _etext. This patch also adds a configuration item for it in Kconfig Signed-off-by: Christophe Leroy --- arch/powerpc/Kconfig | 9 + arch/powerpc/kernel/vmlinux.lds.S | 9 +++-- 2 files changed, 12 insertions(+), 6 deletions(-) diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig index 417e52a27f63..edef40a2b446 100644 --- a/arch/powerpc/Kconfig +++ b/arch/powerpc/Kconfig @@ -725,6 +725,15 @@ config THREAD_SHIFT Used to define the stack size. The default is almost always what you want. Only change this if you know what you are doing. +config ETEXT_SHIFT + int + default PPC_PAGE_SHIFT + +config DATA_SHIFT + int + default 24 if STRICT_KERNEL_RWX && PPC64 + default PPC_PAGE_SHIFT + config FORCE_MAX_ZONEORDER int "Maximum zone order" range 8 9 if PPC64 && PPC_64K_PAGES diff --git a/arch/powerpc/kernel/vmlinux.lds.S b/arch/powerpc/kernel/vmlinux.lds.S index c3efb972c8c1..060a1acd7c6d 100644 --- a/arch/powerpc/kernel/vmlinux.lds.S +++ b/arch/powerpc/kernel/vmlinux.lds.S @@ -12,11 +12,8 @@ #include #include -#if defined(CONFIG_STRICT_KERNEL_RWX) && !defined(CONFIG_PPC32) -#define STRICT_ALIGN_SIZE (1 << 24) -#else -#define STRICT_ALIGN_SIZE PAGE_SIZE -#endif +#define STRICT_ALIGN_SIZE (1 << CONFIG_DATA_SHIFT) +#define ETEXT_ALIGN_SIZE (1 << CONFIG_ETEXT_SHIFT) ENTRY(_stext) @@ -131,7 +128,7 @@ SECTIONS } :kernel - . = ALIGN(PAGE_SIZE); + . = ALIGN(ETEXT_ALIGN_SIZE); _etext = .; PROVIDE32 (etext = .); -- 2.13.3
[PATCH v3 16/16] powerpc/kconfig: make _etext and data areas alignment configurable on 8xx
On 8xx, large pages (512kb or 8M) are used to map kernel linear memory. Aligning to 8M reduces TLB misses as only 8M pages are used in that case. We make 8M the default for data. This patchs allows the user to do it via Kconfig. Signed-off-by: Christophe Leroy --- arch/powerpc/Kconfig | 18 +++--- arch/powerpc/kernel/head_8xx.S | 4 ++-- 2 files changed, 17 insertions(+), 5 deletions(-) diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig index c4d6c97d7699..cf30a8f522b9 100644 --- a/arch/powerpc/Kconfig +++ b/arch/powerpc/Kconfig @@ -726,7 +726,8 @@ config THREAD_SHIFT want. Only change this if you know what you are doing. config ETEXT_SHIFT_BOOL - bool "Set custom etext alignment" if STRICT_KERNEL_RWX && PPC_BOOK3S_32 + bool "Set custom etext alignment" if STRICT_KERNEL_RWX && \ +(PPC_BOOK3S_32 || PPC_8xx) depends on ADVANCED_OPTIONS help This option allows you to set the kernel end of text alignment. When @@ -738,6 +739,7 @@ config ETEXT_SHIFT_BOOL config ETEXT_SHIFT int "_etext shift" if ETEXT_SHIFT_BOOL range 17 28 if STRICT_KERNEL_RWX && PPC_BOOK3S_32 + range 19 23 if STRICT_KERNEL_RWX && PPC_8xx default 17 if STRICT_KERNEL_RWX && PPC_BOOK3S_32 default 19 if STRICT_KERNEL_RWX && PPC_8xx default PPC_PAGE_SHIFT @@ -745,8 +747,13 @@ config ETEXT_SHIFT On Book3S 32 (603+), IBATs are used to map kernel text. Smaller is the alignment, greater is the number of necessary IBATs. + On 8xx, large pages (512kb or 8M) are used to map kernel linear + memory. Aligning to 8M reduces TLB misses as only 8M pages are used + in that case. + config DATA_SHIFT_BOOL - bool "Set custom data alignment" if STRICT_KERNEL_RWX && PPC_BOOK3S_32 + bool "Set custom data alignment" if STRICT_KERNEL_RWX && \ + (PPC_BOOK3S_32 || PPC_8xx) depends on ADVANCED_OPTIONS help This option allows you to set the kernel data alignment. When @@ -759,13 +766,18 @@ config DATA_SHIFT int "Data shift" if DATA_SHIFT_BOOL default 24 if STRICT_KERNEL_RWX && PPC64 range 17 28 if STRICT_KERNEL_RWX && PPC_BOOK3S_32 + range 19 23 if STRICT_KERNEL_RWX && PPC_8xx default 22 if STRICT_KERNEL_RWX && PPC_BOOK3S_32 - default 19 if STRICT_KERNEL_RWX && PPC_8xx + default 23 if STRICT_KERNEL_RWX && PPC_8xx default PPC_PAGE_SHIFT help On Book3S 32 (603+), DBATs are used to map kernel text and rodata RO. Smaller is the alignment, greater is the number of necessary DBATs. + On 8xx, large pages (512kb or 8M) are used to map kernel linear + memory. Aligning to 8M reduces TLB misses as only 8M pages are used + in that case. + config FORCE_MAX_ZONEORDER int "Maximum zone order" range 8 9 if PPC64 && PPC_64K_PAGES diff --git a/arch/powerpc/kernel/head_8xx.S b/arch/powerpc/kernel/head_8xx.S index 01ed8f3c95c8..63f1b7eec3f0 100644 --- a/arch/powerpc/kernel/head_8xx.S +++ b/arch/powerpc/kernel/head_8xx.S @@ -416,7 +416,7 @@ InstructionTLBMiss: #ifndef CONFIG_PIN_TLB_TEXT ITLBMissLinear: mtcrr11 -#ifdef CONFIG_STRICT_KERNEL_RWX +#if defined(CONFIG_STRICT_KERNEL_RWX) && CONFIG_ETEXT_SHIFT < 23 patch_site 0f, patch__itlbmiss_linmem_top8 mfspr r10, SPRN_SRR0 @@ -537,7 +537,7 @@ DTLBMissIMMR: DTLBMissLinear: mtcrr11 rlwinm r10, r10, 20, 0x0f80/* 8xx supports max 256Mb RAM */ -#ifdef CONFIG_STRICT_KERNEL_RWX +#if defined(CONFIG_STRICT_KERNEL_RWX) && CONFIG_DATA_SHIFT < 23 patch_site 0f, patch__dtlbmiss_romem_top8 0: subis r11, r10, (PAGE_OFFSET - 0x8000)@ha -- 2.13.3
[PATCH v3 12/16] powerpc/mm/32s: add setibat() clearibat() and update_bats()
setibat() and clearibat() allows to manipulate IBATs independently of DBATs. update_bats() allows to update bats after init. This is done with MMU off. Signed-off-by: Christophe Leroy --- arch/powerpc/include/asm/book3s/32/mmu-hash.h | 2 ++ arch/powerpc/kernel/head_32.S | 35 +++ arch/powerpc/mm/ppc_mmu_32.c | 32 3 files changed, 69 insertions(+) diff --git a/arch/powerpc/include/asm/book3s/32/mmu-hash.h b/arch/powerpc/include/asm/book3s/32/mmu-hash.h index 0c261ba2c826..5cb588395fdc 100644 --- a/arch/powerpc/include/asm/book3s/32/mmu-hash.h +++ b/arch/powerpc/include/asm/book3s/32/mmu-hash.h @@ -92,6 +92,8 @@ typedef struct { unsigned long vdso_base; } mm_context_t; +void update_bats(void); + /* patch sites */ extern s32 patch__hash_page_A0, patch__hash_page_A1, patch__hash_page_A2; extern s32 patch__hash_page_B, patch__hash_page_C; diff --git a/arch/powerpc/kernel/head_32.S b/arch/powerpc/kernel/head_32.S index c2f564690778..91b302b0797f 100644 --- a/arch/powerpc/kernel/head_32.S +++ b/arch/powerpc/kernel/head_32.S @@ -1104,6 +1104,41 @@ BEGIN_MMU_FTR_SECTION END_MMU_FTR_SECTION_IFSET(MMU_FTR_USE_HIGH_BATS) blr +_ENTRY(update_bats) + lis r4, 1f@h + ori r4, r4, 1f@l + tophys(r4, r4) + mfmsr r6 + mflrr7 + li r3, MSR_KERNEL & ~(MSR_IR | MSR_DR) + rlwinm r0, r6, 0, ~MSR_RI + rlwinm r0, r0, 0, ~MSR_EE + mtmsr r0 + mtspr SPRN_SRR0, r4 + mtspr SPRN_SRR1, r3 + SYNC + RFI +1: bl clear_bats + lis r3, BATS@ha + addir3, r3, BATS@l + tophys(r3, r3) + LOAD_BAT(0, r3, r4, r5) + LOAD_BAT(1, r3, r4, r5) + LOAD_BAT(2, r3, r4, r5) + LOAD_BAT(3, r3, r4, r5) +BEGIN_MMU_FTR_SECTION + LOAD_BAT(4, r3, r4, r5) + LOAD_BAT(5, r3, r4, r5) + LOAD_BAT(6, r3, r4, r5) + LOAD_BAT(7, r3, r4, r5) +END_MMU_FTR_SECTION_IFSET(MMU_FTR_USE_HIGH_BATS) + li r3, MSR_KERNEL & ~(MSR_IR | MSR_DR | MSR_RI) + mtmsr r3 + mtspr SPRN_SRR0, r7 + mtspr SPRN_SRR1, r6 + SYNC + RFI + flush_tlbs: lis r10, 0x40 1: addic. r10, r10, -0x1000 diff --git a/arch/powerpc/mm/ppc_mmu_32.c b/arch/powerpc/mm/ppc_mmu_32.c index ff8580c6ab11..66f1319e8e20 100644 --- a/arch/powerpc/mm/ppc_mmu_32.c +++ b/arch/powerpc/mm/ppc_mmu_32.c @@ -106,6 +106,38 @@ static unsigned int block_size(unsigned long base, unsigned long top) return min3(max_size, 1U << base_shift, 1U << block_shift); } +/* + * Set up one of the IBAT (block address translation) register pairs. + * The parameters are not checked; in particular size must be a power + * of 2 between 128k and 256M. + * Only for 603+ ... + */ +static void setibat(int index, unsigned long virt, phys_addr_t phys, + unsigned int size, pgprot_t prot) +{ + unsigned int bl = (size >> 17) - 1; + int wimgxpp; + struct ppc_bat *bat = BATS[index]; + unsigned long flags = pgprot_val(prot); + + if (!cpu_has_feature(CPU_FTR_NEED_COHERENT)) + flags &= ~_PAGE_COHERENT; + + wimgxpp = (flags & _PAGE_COHERENT) | (_PAGE_EXEC ? BPP_RX : BPP_XX); + bat[0].batu = virt | (bl << 2) | 2; /* Vs=1, Vp=0 */ + bat[0].batl = BAT_PHYS_ADDR(phys) | wimgxpp; + if (flags & _PAGE_USER) + bat[0].batu |= 1; /* Vp = 1 */ +} + +static void clearibat(int index) +{ + struct ppc_bat *bat = BATS[index]; + + bat[0].batu = 0; + bat[0].batl = 0; +} + unsigned long __init mmu_mapin_ram(unsigned long base, unsigned long top) { int idx; -- 2.13.3
[PATCH v3 15/16] powerpc/8xx: don't disable large TLBs with CONFIG_STRICT_KERNEL_RWX
This patch implements handling of STRICT_KERNEL_RWX with large TLBs directly in the TLB miss handlers. To do so, etext and sinittext are aligned on 512kB boundaries and the miss handlers use 512kB pages instead of 8Mb pages for addresses close to the boundaries. It sets RO PP flags for addresses under sinittext. Signed-off-by: Christophe Leroy --- arch/powerpc/Kconfig | 2 ++ arch/powerpc/include/asm/nohash/32/mmu-8xx.h | 3 +- arch/powerpc/kernel/head_8xx.S | 54 +--- arch/powerpc/mm/8xx_mmu.c| 31 +++- arch/powerpc/mm/init_32.c| 2 +- arch/powerpc/mm/mmu_decl.h | 2 +- 6 files changed, 78 insertions(+), 16 deletions(-) diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig index 20c4e3a62b90..c4d6c97d7699 100644 --- a/arch/powerpc/Kconfig +++ b/arch/powerpc/Kconfig @@ -739,6 +739,7 @@ config ETEXT_SHIFT int "_etext shift" if ETEXT_SHIFT_BOOL range 17 28 if STRICT_KERNEL_RWX && PPC_BOOK3S_32 default 17 if STRICT_KERNEL_RWX && PPC_BOOK3S_32 + default 19 if STRICT_KERNEL_RWX && PPC_8xx default PPC_PAGE_SHIFT help On Book3S 32 (603+), IBATs are used to map kernel text. @@ -759,6 +760,7 @@ config DATA_SHIFT default 24 if STRICT_KERNEL_RWX && PPC64 range 17 28 if STRICT_KERNEL_RWX && PPC_BOOK3S_32 default 22 if STRICT_KERNEL_RWX && PPC_BOOK3S_32 + default 19 if STRICT_KERNEL_RWX && PPC_8xx default PPC_PAGE_SHIFT help On Book3S 32 (603+), DBATs are used to map kernel text and rodata RO. diff --git a/arch/powerpc/include/asm/nohash/32/mmu-8xx.h b/arch/powerpc/include/asm/nohash/32/mmu-8xx.h index b0f764c827c0..0a1a3fc54e54 100644 --- a/arch/powerpc/include/asm/nohash/32/mmu-8xx.h +++ b/arch/powerpc/include/asm/nohash/32/mmu-8xx.h @@ -231,9 +231,10 @@ static inline unsigned int mmu_psize_to_shift(unsigned int mmu_psize) } /* patch sites */ -extern s32 patch__itlbmiss_linmem_top; +extern s32 patch__itlbmiss_linmem_top, patch__itlbmiss_linmem_top8; extern s32 patch__dtlbmiss_linmem_top, patch__dtlbmiss_immr_jmp; extern s32 patch__fixupdar_linmem_top; +extern s32 patch__dtlbmiss_romem_top, patch__dtlbmiss_romem_top8; extern s32 patch__itlbmiss_exit_1, patch__itlbmiss_exit_2; extern s32 patch__dtlbmiss_exit_1, patch__dtlbmiss_exit_2, patch__dtlbmiss_exit_3; diff --git a/arch/powerpc/kernel/head_8xx.S b/arch/powerpc/kernel/head_8xx.S index 4a2e3ffdb5bb..01ed8f3c95c8 100644 --- a/arch/powerpc/kernel/head_8xx.S +++ b/arch/powerpc/kernel/head_8xx.S @@ -292,6 +292,17 @@ SystemCall: */ EXCEPTION(0x1000, SoftEmu, program_check_exception, EXC_XFER_STD) +/* Called from DataStoreTLBMiss when perf TLB misses events are activated */ +#ifdef CONFIG_PERF_EVENTS + patch_site 0f, patch__dtlbmiss_perf +0: lwz r10, (dtlb_miss_counter - PAGE_OFFSET)@l(0) + addir10, r10, 1 + stw r10, (dtlb_miss_counter - PAGE_OFFSET)@l(0) + mfspr r10, SPRN_SPRG_SCRATCH0 + mfspr r11, SPRN_SPRG_SCRATCH1 + rfi +#endif + . = 0x1100 /* * For the MPC8xx, this is a software tablewalk to load the instruction @@ -405,10 +416,20 @@ InstructionTLBMiss: #ifndef CONFIG_PIN_TLB_TEXT ITLBMissLinear: mtcrr11 +#ifdef CONFIG_STRICT_KERNEL_RWX + patch_site 0f, patch__itlbmiss_linmem_top8 + + mfspr r10, SPRN_SRR0 +0: subis r11, r10, (PAGE_OFFSET - 0x8000)@ha + rlwinm r11, r11, 4, MI_PS8MEG ^ MI_PS512K + ori r11, r11, MI_PS512K | MI_SVALID + rlwinm r10, r10, 0, 0x0ff8 /* 8xx supports max 256Mb RAM */ +#else /* Set 8M byte page and mark it valid */ li r11, MI_PS8MEG | MI_SVALID - mtspr SPRN_MI_TWC, r11 rlwinm r10, r10, 20, 0x0f80/* 8xx supports max 256Mb RAM */ +#endif + mtspr SPRN_MI_TWC, r11 ori r10, r10, 0xf0 | MI_SPS16K | _PAGE_SH | _PAGE_DIRTY | \ _PAGE_PRESENT mtspr SPRN_MI_RPN, r10/* Update TLB entry */ @@ -494,16 +515,6 @@ DataStoreTLBMiss: rfi patch_site 0b, patch__dtlbmiss_exit_1 -#ifdef CONFIG_PERF_EVENTS - patch_site 0f, patch__dtlbmiss_perf -0: lwz r10, (dtlb_miss_counter - PAGE_OFFSET)@l(0) - addir10, r10, 1 - stw r10, (dtlb_miss_counter - PAGE_OFFSET)@l(0) - mfspr r10, SPRN_SPRG_SCRATCH0 - mfspr r11, SPRN_SPRG_SCRATCH1 - rfi -#endif - DTLBMissIMMR: mtcrr11 /* Set 512k byte guarded page and mark it valid */ @@ -525,10 +536,29 @@ DTLBMissIMMR: DTLBMissLinear: mtcrr11 + rlwinm r10, r10, 20, 0x0f80/* 8xx supports max 256Mb RAM */ +#ifdef CONFIG_STRICT_KERNEL_RWX + patch_site 0f, patch__dtlbmiss_romem_top8 + +0: subis r11, r10, (PAGE_OFFSET - 0x8000)@ha + rlwinm r11, r11,
[PATCH v3 07/16] powerpc/mm/32s: use _PAGE_EXEC in setbat()
Do not set IBAT when setbat() is called without _PAGE_EXEC Signed-off-by: Christophe Leroy --- arch/powerpc/mm/ppc_mmu_32.c | 10 ++ 1 file changed, 6 insertions(+), 4 deletions(-) diff --git a/arch/powerpc/mm/ppc_mmu_32.c b/arch/powerpc/mm/ppc_mmu_32.c index 5fc59b195fef..ff8580c6ab11 100644 --- a/arch/powerpc/mm/ppc_mmu_32.c +++ b/arch/powerpc/mm/ppc_mmu_32.c @@ -131,6 +131,7 @@ unsigned long __init mmu_mapin_ram(unsigned long base, unsigned long top) * Set up one of the I/D BAT (block address translation) register pairs. * The parameters are not checked; in particular size must be a power * of 2 between 128k and 256M. + * On 603+, only set IBAT when _PAGE_EXEC is set */ void __init setbat(int index, unsigned long virt, phys_addr_t phys, unsigned int size, pgprot_t prot) @@ -157,11 +158,12 @@ void __init setbat(int index, unsigned long virt, phys_addr_t phys, bat[1].batu |= 1; /* Vp = 1 */ if (flags & _PAGE_GUARDED) { /* G bit must be zero in IBATs */ - bat[0].batu = bat[0].batl = 0; - } else { - /* make IBAT same as DBAT */ - bat[0] = bat[1]; + flags &= ~_PAGE_EXEC; } + if (flags & _PAGE_EXEC) + bat[0] = bat[1]; + else + bat[0].batu = bat[0].batl = 0; } else { /* 601 cpu */ if (bl > BL_8M) -- 2.13.3
[PATCH v3 04/16] powerpc/mm/32s: use generic mmu_mapin_ram() for all blocks.
Now that mmu_mapin_ram() is able to handle other blocks than the one starting at 0, the WII can use it for all its blocks. Signed-off-by: Christophe Leroy --- arch/powerpc/mm/pgtable_32.c | 25 +++-- 1 file changed, 7 insertions(+), 18 deletions(-) diff --git a/arch/powerpc/mm/pgtable_32.c b/arch/powerpc/mm/pgtable_32.c index b4858818523f..fd665c32a1f7 100644 --- a/arch/powerpc/mm/pgtable_32.c +++ b/arch/powerpc/mm/pgtable_32.c @@ -254,26 +254,15 @@ static void __init __mapin_ram_chunk(unsigned long offset, unsigned long top) void __init mapin_ram(void) { - unsigned long s, top; - -#ifndef CONFIG_WII - top = total_lowmem; - s = mmu_mapin_ram(0, top); - __mapin_ram_chunk(s, top); -#else - if (!wii_hole_size) { - s = mmu_mapin_ram(0, total_lowmem); - __mapin_ram_chunk(s, total_lowmem); - } else { - top = wii_hole_start; - s = mmu_mapin_ram(0, top); - __mapin_ram_chunk(s, top); + struct memblock_region *reg; + + for_each_memblock(memory, reg) { + unsigned long base = reg->base; + unsigned long top = base + reg->size; - top = memblock_end_of_DRAM(); - s = wii_mmu_mapin_mem2(top); - __mapin_ram_chunk(s, top); + base = mmu_mapin_ram(base, top); + __mapin_ram_chunk(base, top); } -#endif } /* Scan the real Linux page tables and return a PTE pointer for -- 2.13.3
[PATCH v3 08/16] powerpc/32: add helper to write into segment registers
This patch add an helper which wraps 'mtsrin' instruction to write into segment registers. Signed-off-by: Christophe Leroy --- arch/powerpc/include/asm/reg.h | 5 + 1 file changed, 5 insertions(+) diff --git a/arch/powerpc/include/asm/reg.h b/arch/powerpc/include/asm/reg.h index 1c98ef1f2d5b..a70cbaf5c26f 100644 --- a/arch/powerpc/include/asm/reg.h +++ b/arch/powerpc/include/asm/reg.h @@ -1425,6 +1425,11 @@ static inline void msr_check_and_clear(unsigned long bits) #define mfsrin(v) ({unsigned int rval; \ asm volatile("mfsrin %0,%1" : "=r" (rval) : "r" (v)); \ rval;}) + +static inline void mtsrin(u32 val, u32 idx) +{ + asm volatile("mtsrin %0, %1" : : "r" (val), "r" (idx)); +} #endif #define proc_trap()asm volatile("trap") -- 2.13.3
[PATCH v3 10/16] powerpc/kconfig: define PAGE_SHIFT inside Kconfig
This patch defined CONFIG_PPC_PAGE_SHIFT in order to be able to use PAGE_SHIFT value inside Kconfig. Signed-off-by: Christophe Leroy --- arch/powerpc/Kconfig| 7 +++ arch/powerpc/include/asm/page.h | 13 ++--- 2 files changed, 9 insertions(+), 11 deletions(-) diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig index 849b0d5ac3d1..417e52a27f63 100644 --- a/arch/powerpc/Kconfig +++ b/arch/powerpc/Kconfig @@ -708,6 +708,13 @@ config PPC_256K_PAGES endchoice +config PPC_PAGE_SHIFT + int + default 18 if PPC_256K_PAGES + default 16 if PPC_64K_PAGES + default 14 if PPC_16K_PAGES + default 12 + config THREAD_SHIFT int "Thread shift" if EXPERT range 13 15 diff --git a/arch/powerpc/include/asm/page.h b/arch/powerpc/include/asm/page.h index aa4497175bd3..ed870468ef6f 100644 --- a/arch/powerpc/include/asm/page.h +++ b/arch/powerpc/include/asm/page.h @@ -20,20 +20,11 @@ /* * On regular PPC32 page size is 4K (but we support 4K/16K/64K/256K pages - * on PPC44x). For PPC64 we support either 4K or 64K software + * on PPC44x and 4K/16K on 8xx). For PPC64 we support either 4K or 64K software * page size. When using 64K pages however, whether we are really supporting * 64K pages in HW or not is irrelevant to those definitions. */ -#if defined(CONFIG_PPC_256K_PAGES) -#define PAGE_SHIFT 18 -#elif defined(CONFIG_PPC_64K_PAGES) -#define PAGE_SHIFT 16 -#elif defined(CONFIG_PPC_16K_PAGES) -#define PAGE_SHIFT 14 -#else -#define PAGE_SHIFT 12 -#endif - +#define PAGE_SHIFT CONFIG_PPC_PAGE_SHIFT #define PAGE_SIZE (ASM_CONST(1) << PAGE_SHIFT) #ifndef __ASSEMBLY__ -- 2.13.3