Re: [PATCH v2 10/15] vrf: Remove the now superfluous sentinel element from ctl_table array

2023-10-02 Thread David Ahern
On 10/2/23 2:55 AM, Joel Granados via B4 Relay wrote:
> From: Joel Granados 
> 
> This commit comes at the tail end of a greater effort to remove the
> empty elements at the end of the ctl_table arrays (sentinels) which
> will reduce the overall build time size of the kernel and run time
> memory bloat by ~64 bytes per sentinel (further information Link :
> https://lore.kernel.org/all/zo5yx5jfoggi%2f...@bombadil.infradead.org/)
> 
> Remove sentinel from vrf_table
> 
> Signed-off-by: Joel Granados 
> ---
>  drivers/net/vrf.c | 1 -
>  1 file changed, 1 deletion(-)
> 

Reviewed-by: David Ahern 




Re: [RFC] Remove DECNET support from kernel

2022-08-01 Thread David Ahern
On 7/31/22 1:06 PM, Stephen Hemminger wrote:
> Decnet is an obsolete network protocol that receives more attention
> from kernel janitors than users. It belongs in computer protocol
> history museum not in Linux kernel.
> 
> It has been Orphaned in kernel since 2010.
> And the documentation link on Sourceforge says it is abandoned there.
> 
> Leave the UAPI alone to keep userspace programs compiling.
> 
> Signed-off-by: Stephen Hemminger 
> ---

Acked-by: David Ahern 




Re: [PATCH v3 0/7] Statsfs: a new ram-based file system for Linux kernel statistics

2020-05-27 Thread David Ahern
On 5/27/20 3:07 PM, Paolo Bonzini wrote:
> I see what you meant now.  statsfs can also be used to enumerate objects
> if one is so inclined (with the prototype in patch 7, for example, each
> network interface becomes a directory).

there are many use cases that have 100's to 1000's have network devices.
Having a sysfs entry per device already bloats memory usage for these
use cases; another filesystem with an entry per device makes that worse.
Really the wrong direction for large scale systems.


Re: [next-20170124] Kernel oops(rt6_fill_node) during reboot of LPAR

2017-01-28 Thread David Ahern
On 1/27/17 5:54 AM, Sachin Sant wrote:
> While rebooting PowerVM LPAR running 4.10.0-rc5-next-20170124
> on a POWER8 box, following kernel oops is displayed.
> 
> This problem was introduced with next-20170123. next-20170120 works.
> Initial analysis points to following patch included with next-20170123
> 
> commit a1a22c12060e4b9c52f45d4b3460f614e00162a2
> net: ipv6: Keep nexthop of multipath route on admin down

Thanks for the report. Fixed by:

https://git.kernel.org/cgit/linux/kernel/git/davem/net-next.git/commit/?id=1f17e2f2c8a8be3430813119fa7b633398f6185b


Re: [PATCH v11 2/4] perf,kvm/{x86,s390}: Remove const from kvm_events_tp

2016-01-28 Thread David Ahern

On 1/27/16 11:33 PM, Hemant Kumar wrote:

This patch removes the "const" qualifier from kvm_events_tp declaration
to account for the fact that some architectures may need to update this
variable dynamically. For instance, powerpc will need to update this
variable dynamically depending on the machine type.

Signed-off-by: Hemant Kumar<hem...@linux.vnet.ibm.com>
---


Acked-by: David Ahern <dsah...@gmail.com>
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH v11 1/4] perf,kvm/{x86,s390}: Remove dependency on uapi/kvm_perf.h

2016-01-28 Thread David Ahern

On 1/27/16 11:33 PM, Hemant Kumar wrote:

Its better to remove the dependency on uapi/kvm_perf.h to allow dynamic
discovery of kvm events (if its needed). To do this, some extern
variables have been introduced with which we can keep the generic
functions generic.

Signed-off-by: Hemant Kumar<hem...@linux.vnet.ibm.com>
Acked-by: Alexander Yarygin<yary...@linux.vnet.ibm.com>


Acked-by: David Ahern <dsah...@gmail.com>
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH v9 1/4] perf,kvm/{x86,s390}: Remove dependency on uapi/kvm_perf.h

2015-10-07 Thread David Ahern

On 10/6/15 8:25 PM, Hemant Kumar wrote:

@@ -358,7 +357,12 @@ static bool handle_end_event(struct perf_kvm_stat *kvm,
time_diff = sample->time - time_begin;

if (kvm->duration && time_diff > kvm->duration) {
-   char decode[DECODE_STR_LEN];
+   char *decode = zalloc(decode_str_len);


decode can still be a stack variable even with variable length.


+
+   if (!decode) {
+   pr_err("Not enough memory\n");
+   return false;
+   }

kvm->events_ops->decode_key(kvm, >key, decode);
if (!skip_event(decode)) {
@@ -366,6 +370,7 @@ static bool handle_end_event(struct perf_kvm_stat *kvm,
 sample->time, sample->pid, 
vcpu_record->vcpu_id,
 decode, time_diff/1000);
}
+   free(decode);
}

return update_kvm_event(event, vcpu, time_diff);
@@ -386,7 +391,8 @@ struct vcpu_event_record *per_vcpu_record(struct thread 
*thread,


-8<-


@@ -575,7 +581,7 @@ static void show_timeofday(void)

  static void print_result(struct perf_kvm_stat *kvm)
  {
-   char decode[DECODE_STR_LEN];
+   char *decode;


and a stack variable here too.

David
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH v8 1/4] perf,kvm/{x86,s390}: Remove dependency on uapi/kvm_perf.h

2015-09-28 Thread David Ahern

On 9/28/15 7:00 AM, Alexander Yarygin wrote:

diff --git a/tools/perf/builtin-kvm.c b/tools/perf/builtin-kvm.c
index fc1cffb..ef25fcf 100644
--- a/tools/perf/builtin-kvm.c
+++ b/tools/perf/builtin-kvm.c
@@ -31,20 +31,18 @@
  #include 

  #ifdef HAVE_KVM_STAT_SUPPORT
-#include 
  #include "util/kvm-stat.h"

-void exit_event_get_key(struct perf_evsel *evsel,
-   struct perf_sample *sample,
+void exit_event_get_key(struct perf_evsel *evsel, struct perf_sample *sample,
struct event_key *key)
  {
key->info = 0;
-   key->key = perf_evsel__intval(evsel, sample, KVM_EXIT_REASON);
+   key->key = perf_evsel__intval(evsel, sample, exit_reason_code);
  }

  bool kvm_exit_event(struct perf_evsel *evsel)
  {
-   return !strcmp(evsel->name, KVM_EXIT_TRACE);
+   return !strncmp(evsel->name, kvm_events_tp[1], strlen(evsel->name));
  }


Hmm, direct access to kvm_events_tp? Maybe add a getter for this or
something like extern char *kvm_exit_trace;?
/* why strncmp? */



  bool exit_event_begin(struct perf_evsel *evsel,
@@ -60,7 +58,7 @@ bool exit_event_begin(struct perf_evsel *evsel,

  bool kvm_entry_event(struct perf_evsel *evsel)
  {
-   return !strcmp(evsel->name, KVM_ENTRY_TRACE);
+   return !strncmp(evsel->name, kvm_events_tp[0], strlen(evsel->name));
  }

  bool exit_event_end(struct perf_evsel *evsel,


I agree; don't rely on kvm_events_tp. Define KVM_ENTRY_TRACE and 
KVM_EXIT_TRACE like x86.


___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH v8 1/4] perf,kvm/{x86,s390}: Remove dependency on uapi/kvm_perf.h

2015-09-28 Thread David Ahern

On 9/28/15 9:16 AM, Scott Wood wrote:

On Mon, 2015-09-28 at 08:31 -0600, David Ahern wrote:

On 9/28/15 7:00 AM, Alexander Yarygin wrote:

diff --git a/tools/perf/builtin-kvm.c b/tools/perf/builtin-kvm.c
index fc1cffb..ef25fcf 100644
--- a/tools/perf/builtin-kvm.c
+++ b/tools/perf/builtin-kvm.c
@@ -31,20 +31,18 @@
   #include 

   #ifdef HAVE_KVM_STAT_SUPPORT
-#include 
   #include "util/kvm-stat.h"

-void exit_event_get_key(struct perf_evsel *evsel,
- struct perf_sample *sample,
+void exit_event_get_key(struct perf_evsel *evsel, struct perf_sample
*sample,
   struct event_key *key)
   {
   key->info = 0;
- key->key = perf_evsel__intval(evsel, sample, KVM_EXIT_REASON);
+ key->key = perf_evsel__intval(evsel, sample, exit_reason_code);
   }

   bool kvm_exit_event(struct perf_evsel *evsel)
   {
- return !strcmp(evsel->name, KVM_EXIT_TRACE);
+ return !strncmp(evsel->name, kvm_events_tp[1], strlen(evsel->name));
   }


Hmm, direct access to kvm_events_tp? Maybe add a getter for this or
something like extern char *kvm_exit_trace;?
/* why strncmp? */



   bool exit_event_begin(struct perf_evsel *evsel,
@@ -60,7 +58,7 @@ bool exit_event_begin(struct perf_evsel *evsel,

   bool kvm_entry_event(struct perf_evsel *evsel)
   {
- return !strcmp(evsel->name, KVM_ENTRY_TRACE);
+ return !strncmp(evsel->name, kvm_events_tp[0], strlen(evsel->name));
   }

   bool exit_event_end(struct perf_evsel *evsel,


I agree; don't rely on kvm_events_tp. Define KVM_ENTRY_TRACE and
KVM_EXIT_TRACE like x86.


If you mean defining them in uapi, that doesn't work for arches that have
multiple subarches that may have different trace events.  This patchset
doesn't actually implement dynamic support for the subarches, but it avoids
adding constants to uapi headers that only apply to one of the subarches.


I don't agree on relying on kvm_events_tp[0] and [1]. If you need that 
to be a runtime definition then change KVM_ENTRY_TRACE to const char 
*kvm_entry_trace and s390 and other arches can have code to set 
kvm_{entry,exit}_trace at runtime.


David
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [RFC PATCH] perf/kvm: Guest Symbol Resolution for powerpc

2015-06-16 Thread David Ahern

On 6/15/15 8:50 PM, Hemant Kumar wrote:

+/*
+ * Get the instruction pointer from the tracepoint data
+ */
+u64 arch__get_ip(struct perf_evsel *evsel, struct perf_sample *data)
+{
+   u64 tp_ip = data-ip;
+   int trap;
+
+   if (!strcmp(KVMPPC_EXIT, evsel-name)) {
+   trap = raw_field_value(evsel-tp_format, trap, 
data-raw_data);
+
+   if (trap == HV_DECREMENTER)
+   tp_ip = raw_field_value(evsel-tp_format, pc,
+   data-raw_data);
+   }
+   return tp_ip;
+}


You can tie a handler to an event; see builtin-trace.c for example 
(evsel-handler = handler). Then have the sample handler call it (e.g, 
see trace__process_sample). Then you don't have to check event names on 
each pass like this and just do event based processing.



+
+/*
+ * Get the HV and PR bits and accordingly, determine the cpumode
+ */
+u8 arch__get_cpumode(union perf_event *event, struct perf_evsel *evsel,
+struct perf_sample *data)
+{
+   unsigned long hv, pr, msr;
+   u8 cpumode = event-header.misc  PERF_RECORD_MISC_CPUMODE_MASK;
+
+   if (strcmp(KVMPPC_EXIT, evsel-name))
+   goto ret;
+
+   if (data-raw_data)
+   msr = raw_field_value(evsel-tp_format, msr, data-raw_data);
+   else
+   goto ret;
+
+   hv = msr  ((long unsigned)1  (PPC_MAX - HV_BIT));
+   pr = msr  ((long unsigned)1  (PPC_MAX - PR_BIT));
+
+   if (!hv  pr)
+   cpumode = PERF_RECORD_MISC_GUEST_USER;
+   else
+   cpumode = PERF_RECORD_MISC_GUEST_KERNEL;
+ret:
+   return cpumode;
+}


Why isn't that set properly kernel side when the sample is generated?

David
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [RFC PATCH] perf/kvm: Guest Symbol Resolution for powerpc

2015-06-16 Thread David Ahern

On 6/16/15 7:24 PM, Hemant Kumar wrote:

Because, this depends on the kernel tracepoint kvm_hv:kvm_guest_exit.
perf_prepare_sample() in the kernel side sets the event-header.misc
field to
PERF_RECORD_MISC_KERNEL through perf_misc_flags(pt_regs). In case of
tracepoints which always get hit in the host kernel context, the
perf_misc_flags() will always return PERF_RECORD_MISC_KERNEL.

IMHO we will rather have to set the cpumode in the user space for this
tracepoint
and we can't depend on the event-header.misc field for this case.

What would you suggest?



oh, right you are using a tracepoint for this. It does not have the 
hooks to specify cpumode. Never mind.

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: Sampling instruction pointer on PPC

2012-03-01 Thread David Ahern

[Added linuxppc-dev list.]

On 3/1/12 10:08 AM, Victor Jimenez wrote:

I am trying to sample instruction pointer along time on a Power7 system.
I know that there are accurate mechanisms to do so in Intel processors
(e.g., PEBS and Branch Trace Store).

Is it possible to do something similar in Power7? Will the samples be
accurate? I am worried that significant delays (skids) may appear.

Thank you,
Victor

WARNING / LEGAL TEXT: This message is intended only for the use of the
individual or entity to which it is addressed and may contain
information which is privileged, confidential, proprietary, or exempt
from disclosure under applicable law. If you are not the intended
recipient or the person responsible for delivering the message to the
intended recipient, you are strictly prohibited from disclosing,
distributing, copying, or in any way using this message. If you have
received this communication in error, please notify the sender and
destroy and delete any copies you may have received.

http://www.bsc.es/disclaimer.htm
--
To unsubscribe from this list: send the line unsubscribe
linux-perf-users in
the body of a message to majord...@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html


___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: [PATCH] perf: powerpc: Disable pagefaults during callchain stack read

2011-08-01 Thread David Ahern


On 08/01/2011 04:39 AM, Benjamin Herrenschmidt wrote:
 On Mon, 2011-08-01 at 11:59 +0200, Peter Zijlstra wrote:
 Signed-off-by: David Ahern dsah...@gmail.com
 CC: Benjamin Herrenschmidt b...@kernel.crashing.org
 CC: Anton Blanchard an...@samba.org 

 Hmm, Paul, didn't you fix something like this early on? Anyway, I've
 no
 objections since I'm really not familiar enough with the PPC side of
 things. 
 
 I'm travelling so I haven't had a chance to review properly or even test
 but it looks like an ad-hoc fix for the immediate problem.
 
 Ultimately, I want to rework that stuff to do a __gup_fast like x86 does
 (maybe as a fallback from an attempt at access first) so we work around
 access permissions blocked by lack of dirty/accessed bits but in the
 meantime, this should fix the immediate issue.

The problem goes back to all kernel releases with perf, so this patch
should get applied to the stable trains too.

David

 
 Cheers,
 Ben.
 
 
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


[PATCH] perf: powerpc: Disable pagefaults during callchain stack read

2011-07-30 Thread David Ahern
Panic observed on an older kernel when collecting call chains for
the context-switch software event:

 [b0180e00]rb_erase+0x1b4/0x3e8
 [b00430f4]__dequeue_entity+0x50/0xe8
 [b0043304]set_next_entity+0x178/0x1bc
 [b0043440]pick_next_task_fair+0xb0/0x118
 [b02ada80]schedule+0x500/0x614
 [b02afaa8]rwsem_down_failed_common+0xf0/0x264
 [b02afca0]rwsem_down_read_failed+0x34/0x54
 [b02aed4c]down_read+0x3c/0x54
 [b0023b58]do_page_fault+0x114/0x5e8
 [b001e350]handle_page_fault+0xc/0x80
 [b0022dec]perf_callchain+0x224/0x31c
 [b009ba70]perf_prepare_sample+0x240/0x2fc
 [b009d760]__perf_event_overflow+0x280/0x398
 [b009d914]perf_swevent_overflow+0x9c/0x10c
 [b009db54]perf_swevent_ctx_event+0x1d0/0x230
 [b009dc38]do_perf_sw_event+0x84/0xe4
 [b009dde8]perf_sw_event_context_switch+0x150/0x1b4
 [b009de90]perf_event_task_sched_out+0x44/0x2d4
 [b02ad840]schedule+0x2c0/0x614
 [b0047dc0]__cond_resched+0x34/0x90
 [b02adcc8]_cond_resched+0x4c/0x68
 [b00bccf8]move_page_tables+0xb0/0x418
 [b00d7ee0]setup_arg_pages+0x184/0x2a0
 [b0110914]load_elf_binary+0x394/0x1208
 [b00d6e28]search_binary_handler+0xe0/0x2c4
 [b00d834c]do_execve+0x1bc/0x268
 [b0015394]sys_execve+0x84/0xc8
 [b001df10]ret_from_syscall+0x0/0x3c

A page fault occurred walking the callchain while creating a perf
sample for the context-switch event. To handle the page fault the
mmap_sem is needed, but it is currently held by setup_arg_pages.
(setup_arg_pages calls shift_arg_pages with the mmap_sem held.
shift_arg_pages then calls move_page_tables which has a cond_resched
at the top of its for loop - hitting that cond_resched is what caused
the context switch.)

This is an extension of Anton's proposed patch:
https://lkml.org/lkml/2011/7/24/151
adding case for 32-bit ppc.

Tested on the system that first generated the panic and then again
with latest kernel using a PPC VM. I am not able to test the 64-bit
path - I do not have H/W for it and 64-bit PPC VMs (qemu on Intel)
is horribly slow.

Signed-off-by: David Ahern dsah...@gmail.com
CC: Benjamin Herrenschmidt b...@kernel.crashing.org
CC: Anton Blanchard an...@samba.org
CC: Peter Zijlstra a.p.zijls...@chello.nl
CC: Paul Mackerras pau...@samba.org
CC: Ingo Molnar mi...@elte.hu
CC: Arnaldo Carvalho de Melo a...@ghostprotocols.net
CC: linuxppc-dev@lists.ozlabs.org
CC: linux-ker...@vger.kernel.org

---
 arch/powerpc/kernel/perf_callchain.c |   20 +---
 1 files changed, 17 insertions(+), 3 deletions(-)

diff --git a/arch/powerpc/kernel/perf_callchain.c 
b/arch/powerpc/kernel/perf_callchain.c
index d05ae42..564c1d8 100644
--- a/arch/powerpc/kernel/perf_callchain.c
+++ b/arch/powerpc/kernel/perf_callchain.c
@@ -154,8 +154,12 @@ static int read_user_stack_64(unsigned long __user *ptr, 
unsigned long *ret)
((unsigned long)ptr  7))
return -EFAULT;
 
-   if (!__get_user_inatomic(*ret, ptr))
+   pagefault_disable();
+   if (!__get_user_inatomic(*ret, ptr)) {
+   pagefault_enable();
return 0;
+   }
+   pagefault_enable();
 
return read_user_stack_slow(ptr, ret, 8);
 }
@@ -166,8 +170,12 @@ static int read_user_stack_32(unsigned int __user *ptr, 
unsigned int *ret)
((unsigned long)ptr  3))
return -EFAULT;
 
-   if (!__get_user_inatomic(*ret, ptr))
+   pagefault_disable();
+   if (!__get_user_inatomic(*ret, ptr)) {
+   pagefault_enable();
return 0;
+   }
+   pagefault_enable();
 
return read_user_stack_slow(ptr, ret, 4);
 }
@@ -294,11 +302,17 @@ static inline int current_is_64bit(void)
  */
 static int read_user_stack_32(unsigned int __user *ptr, unsigned int *ret)
 {
+   int rc;
+
if ((unsigned long)ptr  TASK_SIZE - sizeof(unsigned int) ||
((unsigned long)ptr  3))
return -EFAULT;
 
-   return __get_user_inatomic(*ret, ptr);
+   pagefault_disable();
+   rc = __get_user_inatomic(*ret, ptr);
+   pagefault_enable();
+
+   return rc;
 }
 
 static inline void perf_callchain_user_64(struct perf_callchain_entry *entry,
-- 
1.7.6

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: perf PPC: kernel panic with callchains and context switch events

2011-07-25 Thread David Ahern
Hi Ben:

On 07/24/2011 07:55 PM, Benjamin Herrenschmidt wrote:
 On Sun, 2011-07-24 at 11:18 -0600, David Ahern wrote:
 On 07/20/2011 03:57 PM, David Ahern wrote:
 I am hoping someone familiar with PPC can help understand a panic that
 is generated when capturing callchains with context switch events.

 Call trace is below. The short of it is that walking the callchain
 generates a page fault. To handle the page fault the mmap_sem is needed,
 but it is currently held by setup_arg_pages. setup_arg_pages calls
 shift_arg_pages with the mmap_sem held. shift_arg_pages then calls
 move_page_tables which has a cond_resched at the top of its for loop. If
 the cond_resched() is removed from move_page_tables everything works
 beautifully - no panics.

 So, the question: is it normal for walking the stack to trigger a page
 fault on PPC? The panic is not seen on x86 based systems.

 Can anyone confirm whether page faults while walking the stack are
 normal for PPC? We really want to use the context switch event with
 callchains and need to understand whether this behavior is normal. Of
 course if it is normal, a way to address the problem without a panic
 will be needed.
 
 Now that leads to interesting discoveries :-) Becky, can you read all
 the way and let me know what you think ?
 
 So, trying to walk the user stack directly will potentially cause page
 faults if it's done by direct access. So if you're going to do it in a
 spot where you can't afford it, you need to pagefault_disable() I
 suppose. I think the problem with our existing code is that it's missing
 those around __get_user_inatomic().
 
 In fact, arguably, we don't want the hash code from modifying the hash
 either (or even hashing things in). Our 64-bit code handles it today in
 perf_callchain.c in a way that involves pretty much duplicating the
 functionality of __get_user_pages_fast() as used by x86 (see below), but
 as a fallback from a direct access which misses the pagefault_disable()
 as well.
 
 I think it comes from an old assumption that this would always be called
 from an nmi, and the explicit tracepoints broke that assumption.
 
 In fact we probably want to bump the NMI count, not just the IRQ count
 as pagefault_disable() does, to make sure we prevent hashing. 
 
 x86 does things differently, using __get_user_pages_fast() (a variant of
 get_user_page_fast() that doesn't fallback to normal get_user_pages()).
 
 Now, we could do the same (use __gup_fast too), but I can see a
 potential issue with ppc 32-bit platforms that have 64-bit PTEs, since
 we could end up GUP'ing in the middle of the two accesses.
 
 Becky: I think gup_fast is generally broken on 32-bit with 64-bit PTE
 because of that, the problem isn't specific to perf backtraces, I'll
 propose a solution further down.
 
 Now, on x86, there is a similar problem with PAE, which is handled by
 
  - having gup disable IRQs
  - rely on the fact that to change from a valid value to another valid
value, the PTE will first get invalidated, which requires an IPI
and thus will be blocked by our interrupts being off
 
 We do the first part, but the second part will break if we use HW TLB
 invalidation broadcast (yet another reason why those are bad, I think I
 will write a blog entry about it one of these days).
 
 I think we can work around this while keeping our broadcast TLB
 invalidations by having the invalidation code also increment a global
 generation count (using the existing lock used by the invalidation code,
 all 32-bit platforms have such a lock).
 
 From there, gup_fast can be changed to, with proper ordering, check the
 generation count around the loading of the PTE and loop if it has
 changed, kind-of a seqlock.
 
 We also need the NMI count bump if we are going to try to keep the
 attempt at doing a direct access first for perfs.
 
 Becky, do you feel like giving that a shot or should I find another
 victim ? (Or even do it myself ... ) :-)

Did you have something in mind besides the patch Anton sent? We'll give
that one a try and see how it works. (Thanks, Anton!)

David

 
 Cheers,
 Ben.
 
 Thanks,
 David


  [b0180e00]rb_erase+0x1b4/0x3e8
  [b00430f4]__dequeue_entity+0x50/0xe8
  [b0043304]set_next_entity+0x178/0x1bc
  [b0043440]pick_next_task_fair+0xb0/0x118
  [b02ada80]schedule+0x500/0x614
  [b02afaa8]rwsem_down_failed_common+0xf0/0x264
  [b02afca0]rwsem_down_read_failed+0x34/0x54
  [b02aed4c]down_read+0x3c/0x54
  [b0023b58]do_page_fault+0x114/0x5e8
  [b001e350]handle_page_fault+0xc/0x80
  [b0022dec]perf_callchain+0x224/0x31c
  [b009ba70]perf_prepare_sample+0x240/0x2fc
  [b009d760]__perf_event_overflow+0x280/0x398
  [b009d914]perf_swevent_overflow+0x9c/0x10c
  [b009db54]perf_swevent_ctx_event+0x1d0/0x230
  [b009dc38]do_perf_sw_event+0x84/0xe4
  [b009dde8]perf_sw_event_context_switch+0x150/0x1b4
  [b009de90]perf_event_task_sched_out+0x44/0x2d4
  [b02ad840]schedule+0x2c0/0x614
  [b0047dc0]__cond_resched+0x34/0x90
  [b02adcc8]_cond_resched+0x4c/0x68
  [b00bccf8

Re: perf PPC: kernel panic with callchains and context switch events

2011-07-24 Thread David Ahern
On 07/20/2011 03:57 PM, David Ahern wrote:
 I am hoping someone familiar with PPC can help understand a panic that
 is generated when capturing callchains with context switch events.
 
 Call trace is below. The short of it is that walking the callchain
 generates a page fault. To handle the page fault the mmap_sem is needed,
 but it is currently held by setup_arg_pages. setup_arg_pages calls
 shift_arg_pages with the mmap_sem held. shift_arg_pages then calls
 move_page_tables which has a cond_resched at the top of its for loop. If
 the cond_resched() is removed from move_page_tables everything works
 beautifully - no panics.
 
 So, the question: is it normal for walking the stack to trigger a page
 fault on PPC? The panic is not seen on x86 based systems.

Can anyone confirm whether page faults while walking the stack are
normal for PPC? We really want to use the context switch event with
callchains and need to understand whether this behavior is normal. Of
course if it is normal, a way to address the problem without a panic
will be needed.

Thanks,
David

 
  [b0180e00]rb_erase+0x1b4/0x3e8
  [b00430f4]__dequeue_entity+0x50/0xe8
  [b0043304]set_next_entity+0x178/0x1bc
  [b0043440]pick_next_task_fair+0xb0/0x118
  [b02ada80]schedule+0x500/0x614
  [b02afaa8]rwsem_down_failed_common+0xf0/0x264
  [b02afca0]rwsem_down_read_failed+0x34/0x54
  [b02aed4c]down_read+0x3c/0x54
  [b0023b58]do_page_fault+0x114/0x5e8
  [b001e350]handle_page_fault+0xc/0x80
  [b0022dec]perf_callchain+0x224/0x31c
  [b009ba70]perf_prepare_sample+0x240/0x2fc
  [b009d760]__perf_event_overflow+0x280/0x398
  [b009d914]perf_swevent_overflow+0x9c/0x10c
  [b009db54]perf_swevent_ctx_event+0x1d0/0x230
  [b009dc38]do_perf_sw_event+0x84/0xe4
  [b009dde8]perf_sw_event_context_switch+0x150/0x1b4
  [b009de90]perf_event_task_sched_out+0x44/0x2d4
  [b02ad840]schedule+0x2c0/0x614
  [b0047dc0]__cond_resched+0x34/0x90
  [b02adcc8]_cond_resched+0x4c/0x68
  [b00bccf8]move_page_tables+0xb0/0x418
  [b00d7ee0]setup_arg_pages+0x184/0x2a0
  [b0110914]load_elf_binary+0x394/0x1208
  [b00d6e28]search_binary_handler+0xe0/0x2c4
  [b00d834c]do_execve+0x1bc/0x268
  [b0015394]sys_execve+0x84/0xc8
  [b001df10]ret_from_syscall+0x0/0x3c
 
 Thanks,
 David
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Fwd: perf PPC: kernel panic with callchains and context switch events

2011-07-20 Thread David Ahern
[suggestion to try this mailing list as well]

 Original Message 
Subject: perf PPC: kernel panic with callchains and context switch events
Date: Wed, 20 Jul 2011 15:57:51 -0600
From: David Ahern dsah...@gmail.com
To: Anton Blanchard an...@samba.org, Paul Mackerras
pau...@samba.org,  linux-perf-us...@vger.kernel.org, LKML
linux-ker...@vger.kernel.org

I am hoping someone familiar with PPC can help understand a panic that
is generated when capturing callchains with context switch events.

Call trace is below. The short of it is that walking the callchain
generates a page fault. To handle the page fault the mmap_sem is needed,
but it is currently held by setup_arg_pages. setup_arg_pages calls
shift_arg_pages with the mmap_sem held. shift_arg_pages then calls
move_page_tables which has a cond_resched at the top of its for loop. If
the cond_resched() is removed from move_page_tables everything works
beautifully - no panics.

So, the question: is it normal for walking the stack to trigger a page
fault on PPC? The panic is not seen on x86 based systems.

 [b0180e00]rb_erase+0x1b4/0x3e8
 [b00430f4]__dequeue_entity+0x50/0xe8
 [b0043304]set_next_entity+0x178/0x1bc
 [b0043440]pick_next_task_fair+0xb0/0x118
 [b02ada80]schedule+0x500/0x614
 [b02afaa8]rwsem_down_failed_common+0xf0/0x264
 [b02afca0]rwsem_down_read_failed+0x34/0x54
 [b02aed4c]down_read+0x3c/0x54
 [b0023b58]do_page_fault+0x114/0x5e8
 [b001e350]handle_page_fault+0xc/0x80
 [b0022dec]perf_callchain+0x224/0x31c
 [b009ba70]perf_prepare_sample+0x240/0x2fc
 [b009d760]__perf_event_overflow+0x280/0x398
 [b009d914]perf_swevent_overflow+0x9c/0x10c
 [b009db54]perf_swevent_ctx_event+0x1d0/0x230
 [b009dc38]do_perf_sw_event+0x84/0xe4
 [b009dde8]perf_sw_event_context_switch+0x150/0x1b4
 [b009de90]perf_event_task_sched_out+0x44/0x2d4
 [b02ad840]schedule+0x2c0/0x614
 [b0047dc0]__cond_resched+0x34/0x90
 [b02adcc8]_cond_resched+0x4c/0x68
 [b00bccf8]move_page_tables+0xb0/0x418
 [b00d7ee0]setup_arg_pages+0x184/0x2a0
 [b0110914]load_elf_binary+0x394/0x1208
 [b00d6e28]search_binary_handler+0xe0/0x2c4
 [b00d834c]do_execve+0x1bc/0x268
 [b0015394]sys_execve+0x84/0xc8
 [b001df10]ret_from_syscall+0x0/0x3c

Thanks,
David
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev