Re: [PATCH] perf: powerpc: Disable pagefaults during callchain stack read
On Sat, 2011-07-30 at 14:53 -0600, David Ahern wrote: A page fault occurred walking the callchain while creating a perf sample for the context-switch event. To handle the page fault the mmap_sem is needed, but it is currently held by setup_arg_pages. (setup_arg_pages calls shift_arg_pages with the mmap_sem held. shift_arg_pages then calls move_page_tables which has a cond_resched at the top of its for loop - hitting that cond_resched is what caused the context switch.) This is an extension of Anton's proposed patch: https://lkml.org/lkml/2011/7/24/151 adding case for 32-bit ppc. Tested on the system that first generated the panic and then again with latest kernel using a PPC VM. I am not able to test the 64-bit path - I do not have H/W for it and 64-bit PPC VMs (qemu on Intel) is horribly slow. Signed-off-by: David Ahern dsah...@gmail.com CC: Benjamin Herrenschmidt b...@kernel.crashing.org CC: Anton Blanchard an...@samba.org Hmm, Paul, didn't you fix something like this early on? Anyway, I've no objections since I'm really not familiar enough with the PPC side of things. ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH] perf: powerpc: Disable pagefaults during callchain stack read
On Mon, 2011-08-01 at 11:59 +0200, Peter Zijlstra wrote: Signed-off-by: David Ahern dsah...@gmail.com CC: Benjamin Herrenschmidt b...@kernel.crashing.org CC: Anton Blanchard an...@samba.org Hmm, Paul, didn't you fix something like this early on? Anyway, I've no objections since I'm really not familiar enough with the PPC side of things. I'm travelling so I haven't had a chance to review properly or even test but it looks like an ad-hoc fix for the immediate problem. Ultimately, I want to rework that stuff to do a __gup_fast like x86 does (maybe as a fallback from an attempt at access first) so we work around access permissions blocked by lack of dirty/accessed bits but in the meantime, this should fix the immediate issue. Cheers, Ben. ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH] perf: powerpc: Disable pagefaults during callchain stack read
On 08/01/2011 04:39 AM, Benjamin Herrenschmidt wrote: On Mon, 2011-08-01 at 11:59 +0200, Peter Zijlstra wrote: Signed-off-by: David Ahern dsah...@gmail.com CC: Benjamin Herrenschmidt b...@kernel.crashing.org CC: Anton Blanchard an...@samba.org Hmm, Paul, didn't you fix something like this early on? Anyway, I've no objections since I'm really not familiar enough with the PPC side of things. I'm travelling so I haven't had a chance to review properly or even test but it looks like an ad-hoc fix for the immediate problem. Ultimately, I want to rework that stuff to do a __gup_fast like x86 does (maybe as a fallback from an attempt at access first) so we work around access permissions blocked by lack of dirty/accessed bits but in the meantime, this should fix the immediate issue. The problem goes back to all kernel releases with perf, so this patch should get applied to the stable trains too. David Cheers, Ben. ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[PATCH] perf: powerpc: Disable pagefaults during callchain stack read
Panic observed on an older kernel when collecting call chains for the context-switch software event: [b0180e00]rb_erase+0x1b4/0x3e8 [b00430f4]__dequeue_entity+0x50/0xe8 [b0043304]set_next_entity+0x178/0x1bc [b0043440]pick_next_task_fair+0xb0/0x118 [b02ada80]schedule+0x500/0x614 [b02afaa8]rwsem_down_failed_common+0xf0/0x264 [b02afca0]rwsem_down_read_failed+0x34/0x54 [b02aed4c]down_read+0x3c/0x54 [b0023b58]do_page_fault+0x114/0x5e8 [b001e350]handle_page_fault+0xc/0x80 [b0022dec]perf_callchain+0x224/0x31c [b009ba70]perf_prepare_sample+0x240/0x2fc [b009d760]__perf_event_overflow+0x280/0x398 [b009d914]perf_swevent_overflow+0x9c/0x10c [b009db54]perf_swevent_ctx_event+0x1d0/0x230 [b009dc38]do_perf_sw_event+0x84/0xe4 [b009dde8]perf_sw_event_context_switch+0x150/0x1b4 [b009de90]perf_event_task_sched_out+0x44/0x2d4 [b02ad840]schedule+0x2c0/0x614 [b0047dc0]__cond_resched+0x34/0x90 [b02adcc8]_cond_resched+0x4c/0x68 [b00bccf8]move_page_tables+0xb0/0x418 [b00d7ee0]setup_arg_pages+0x184/0x2a0 [b0110914]load_elf_binary+0x394/0x1208 [b00d6e28]search_binary_handler+0xe0/0x2c4 [b00d834c]do_execve+0x1bc/0x268 [b0015394]sys_execve+0x84/0xc8 [b001df10]ret_from_syscall+0x0/0x3c A page fault occurred walking the callchain while creating a perf sample for the context-switch event. To handle the page fault the mmap_sem is needed, but it is currently held by setup_arg_pages. (setup_arg_pages calls shift_arg_pages with the mmap_sem held. shift_arg_pages then calls move_page_tables which has a cond_resched at the top of its for loop - hitting that cond_resched is what caused the context switch.) This is an extension of Anton's proposed patch: https://lkml.org/lkml/2011/7/24/151 adding case for 32-bit ppc. Tested on the system that first generated the panic and then again with latest kernel using a PPC VM. I am not able to test the 64-bit path - I do not have H/W for it and 64-bit PPC VMs (qemu on Intel) is horribly slow. Signed-off-by: David Ahern dsah...@gmail.com CC: Benjamin Herrenschmidt b...@kernel.crashing.org CC: Anton Blanchard an...@samba.org CC: Peter Zijlstra a.p.zijls...@chello.nl CC: Paul Mackerras pau...@samba.org CC: Ingo Molnar mi...@elte.hu CC: Arnaldo Carvalho de Melo a...@ghostprotocols.net CC: linuxppc-dev@lists.ozlabs.org CC: linux-ker...@vger.kernel.org --- arch/powerpc/kernel/perf_callchain.c | 20 +--- 1 files changed, 17 insertions(+), 3 deletions(-) diff --git a/arch/powerpc/kernel/perf_callchain.c b/arch/powerpc/kernel/perf_callchain.c index d05ae42..564c1d8 100644 --- a/arch/powerpc/kernel/perf_callchain.c +++ b/arch/powerpc/kernel/perf_callchain.c @@ -154,8 +154,12 @@ static int read_user_stack_64(unsigned long __user *ptr, unsigned long *ret) ((unsigned long)ptr 7)) return -EFAULT; - if (!__get_user_inatomic(*ret, ptr)) + pagefault_disable(); + if (!__get_user_inatomic(*ret, ptr)) { + pagefault_enable(); return 0; + } + pagefault_enable(); return read_user_stack_slow(ptr, ret, 8); } @@ -166,8 +170,12 @@ static int read_user_stack_32(unsigned int __user *ptr, unsigned int *ret) ((unsigned long)ptr 3)) return -EFAULT; - if (!__get_user_inatomic(*ret, ptr)) + pagefault_disable(); + if (!__get_user_inatomic(*ret, ptr)) { + pagefault_enable(); return 0; + } + pagefault_enable(); return read_user_stack_slow(ptr, ret, 4); } @@ -294,11 +302,17 @@ static inline int current_is_64bit(void) */ static int read_user_stack_32(unsigned int __user *ptr, unsigned int *ret) { + int rc; + if ((unsigned long)ptr TASK_SIZE - sizeof(unsigned int) || ((unsigned long)ptr 3)) return -EFAULT; - return __get_user_inatomic(*ret, ptr); + pagefault_disable(); + rc = __get_user_inatomic(*ret, ptr); + pagefault_enable(); + + return rc; } static inline void perf_callchain_user_64(struct perf_callchain_entry *entry, -- 1.7.6 ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[PATCH] perf: powerpc: Disable pagefaults during callchain stack read
Hi David, I am hoping someone familiar with PPC can help understand a panic that is generated when capturing callchains with context switch events. Call trace is below. The short of it is that walking the callchain generates a page fault. To handle the page fault the mmap_sem is needed, but it is currently held by setup_arg_pages. setup_arg_pages calls shift_arg_pages with the mmap_sem held. shift_arg_pages then calls move_page_tables which has a cond_resched at the top of its for loop. If the cond_resched() is removed from move_page_tables everything works beautifully - no panics. So, the question: is it normal for walking the stack to trigger a page fault on PPC? The panic is not seen on x86 based systems. Can anyone confirm whether page faults while walking the stack are normal for PPC? We really want to use the context switch event with callchains and need to understand whether this behavior is normal. Of course if it is normal, a way to address the problem without a panic will be needed. I talked to Ben about this last week and he pointed me at pagefault_disable/enable. Untested patch below. Anton -- We need to disable pagefaults when reading the stack otherwise we can lock up trying to take the mmap_sem when the code we are profiling already has a write lock taken. This will not happen for hardware events, but could for software events. Reported-by: David Ahern dsah...@gmail.com Signed-off-by: Anton Blanchard an...@samba.org Cc: sta...@kernel.org --- Index: linux-powerpc/arch/powerpc/kernel/perf_callchain.c === --- linux-powerpc.orig/arch/powerpc/kernel/perf_callchain.c 2011-07-25 09:54:27.296757427 +1000 +++ linux-powerpc/arch/powerpc/kernel/perf_callchain.c 2011-07-25 09:56:08.828367882 +1000 @@ -154,8 +154,12 @@ static int read_user_stack_64(unsigned l ((unsigned long)ptr 7)) return -EFAULT; - if (!__get_user_inatomic(*ret, ptr)) + pagefault_disable(); + if (!__get_user_inatomic(*ret, ptr)) { + pagefault_enable(); return 0; + } + pagefault_enable(); return read_user_stack_slow(ptr, ret, 8); } @@ -166,8 +170,12 @@ static int read_user_stack_32(unsigned i ((unsigned long)ptr 3)) return -EFAULT; - if (!__get_user_inatomic(*ret, ptr)) + pagefault_disable(); + if (!__get_user_inatomic(*ret, ptr)) { + pagefault_enable(); return 0; + } + pagefault_enable(); return read_user_stack_slow(ptr, ret, 4); } ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev