Re: Possible race between PTRACE_SETVFPREGS and PTRACE_CONT on ARM?
On 16-06-02 09:15 AM, Russell King - ARM Linux wrote: > Hi, can I add a: > > Tested-by: Simon Marchi > > tag to the commit please? Yes, of course.
Re: Possible race between PTRACE_SETVFPREGS and PTRACE_CONT on ARM?
On Wed, Jun 01, 2016 at 08:54:05AM -0400, Simon Marchi wrote: > On 16-05-30 05:35 PM, Russell King - ARM Linux wrote: > > So, the gdb verisons I have here seem to be particularly poor - but with > > some modifications, I can test out on iMX6 by forcing gdb to do the right > > thing - by inserting a couple of "mov r0, r0" instructions after the > > "break_here" label. > > I see that problem too with older versions, bisecting shows it has been fixed > in commit > > 6e22494e5076 Do not skip prologue for asm (.S) files > > in gdb, which is included in gdb 7.10 and up. > > > With that, on a single CPU, it seems to work correctly every time, but > > if I bring up a secondary CPU I start seeing the same problems you've > > reported - which seems to need the following patch to solve. Please can > > you check whether this resolves your problem? > > Yes that fixes the problem, the test case succeeds every time. I have stared > at those lines in ptrace.c for some time, but couldn't find the problem. > Thanks > for looking into it! Hi, can I add a: Tested-by: Simon Marchi tag to the commit please? Many thanks. -- RMK's Patch system: http://www.armlinux.org.uk/developer/patches/ FTTC broadband for 0.8mile line: currently at 9.6Mbps down 400kbps up according to speedtest.net.
Re: Possible race between PTRACE_SETVFPREGS and PTRACE_CONT on ARM?
On 16-05-30 05:35 PM, Russell King - ARM Linux wrote: > So, the gdb verisons I have here seem to be particularly poor - but with > some modifications, I can test out on iMX6 by forcing gdb to do the right > thing - by inserting a couple of "mov r0, r0" instructions after the > "break_here" label. I see that problem too with older versions, bisecting shows it has been fixed in commit 6e22494e5076 Do not skip prologue for asm (.S) files in gdb, which is included in gdb 7.10 and up. > With that, on a single CPU, it seems to work correctly every time, but > if I bring up a secondary CPU I start seeing the same problems you've > reported - which seems to need the following patch to solve. Please can > you check whether this resolves your problem? Yes that fixes the problem, the test case succeeds every time. I have stared at those lines in ptrace.c for some time, but couldn't find the problem. Thanks for looking into it!
Re: Possible race between PTRACE_SETVFPREGS and PTRACE_CONT on ARM?
On Tue, May 31, 2016 at 02:52:52PM +0100, Will Deacon wrote: > The only thing I'm uncertain of is whether or not PTRACE_SEIZE/PTRACE_LISTEN > allow switching to the child (but even then, how is the parent doing > to issue such a request?). I can't see how that would be possible - the parent needs to finish the vfp_set() call before it can issue any others - and there can only be one tracer. IOW, if we're in vfp_set(), the thread must have attached to the child, at which point PTRACE_SEIZE will be rejected. PTRACE_LISTEN also requires the child to be in STOP state as well, and again, can only be issued from the current tracer. -- RMK's Patch system: http://www.armlinux.org.uk/developer/patches/ FTTC broadband for 0.8mile line: currently at 9.6Mbps down 400kbps up according to speedtest.net.
Re: Possible race between PTRACE_SETVFPREGS and PTRACE_CONT on ARM?
On Mon, May 30, 2016 at 11:40:28PM +0100, Russell King - ARM Linux wrote: > On Mon, May 30, 2016 at 10:35:29PM +0100, Russell King - ARM Linux wrote: > > With that, on a single CPU, it seems to work correctly every time, but > > if I bring up a secondary CPU I start seeing the same problems you've > > reported - which seems to need the following patch to solve. Please can > > you check whether this resolves your problem? > > More background behind this patch... I think that commit 8130b9d7b9d8 > ("ARM: 7308/1: vfp: flush thread hwstate before copying ptrace registers") > was wrong. > > Let's look at what's happening. The above commit had the following effect: > > vfp_sync_hwstate(); > new_vfp = thread->vfpstate.hard; > modify new_vfp (but not new_vfp.cpu) > + vfp_flush_hwstate(thread); > thread->vfpstate.hard = new_vfp; > - vfp_flush_hwstate(thread); > > Now, the commit above claims that a context switch mid-copy of new_vfp > into thread->vfpstate.hard may result in corrupted state. > > I'm not quite sure how we could get to that point: the only thread which > is allowed to touch the traced child's state is the ptracing parent, and > the ptracing parent can't do anything until after vfp_set() has returned. > Even if the traced child gets a fatal signal, the point at which the > signal is delivered to the child is when returning to userspace, which > means the child must be runnable. That in turn means that we are not > in the middle of changing the register state. > > So, I'm not sure where Will got the idea that the partially copied state > would be visible: it shouldn't ever be visible. If it is visible, then > we've got problems even with Will's patch applied, because a partial > copy whenever it happens would be visible. > > In any case, the above change is wrong: vfp_flush_hwstate() sets > thread->vfpstate.hard.cpu to an invalid CPU number. Switching the order > as Will has done _undoes_ the effect of vfp_flush_hwstate(), which leads > to the bug you are seeing. > > So, I think the correct approach is to revert it. Yes, it looks like I was over-zealous in fixing all users of vfp_flush_hwstate after I fixed the signal handling code. Given that the child isn't runnable until the parent has finished poking at the register state, the original code looks fine. The only thing I'm uncertain of is whether or not PTRACE_SEIZE/PTRACE_LISTEN allow switching to the child (but even then, how is the parent doing to issue such a request?). Will
Re: Possible race between PTRACE_SETVFPREGS and PTRACE_CONT on ARM?
On Mon, May 30, 2016 at 10:35:29PM +0100, Russell King - ARM Linux wrote: > With that, on a single CPU, it seems to work correctly every time, but > if I bring up a secondary CPU I start seeing the same problems you've > reported - which seems to need the following patch to solve. Please can > you check whether this resolves your problem? More background behind this patch... I think that commit 8130b9d7b9d8 ("ARM: 7308/1: vfp: flush thread hwstate before copying ptrace registers") was wrong. Let's look at what's happening. The above commit had the following effect: vfp_sync_hwstate(); new_vfp = thread->vfpstate.hard; modify new_vfp (but not new_vfp.cpu) + vfp_flush_hwstate(thread); thread->vfpstate.hard = new_vfp; - vfp_flush_hwstate(thread); Now, the commit above claims that a context switch mid-copy of new_vfp into thread->vfpstate.hard may result in corrupted state. I'm not quite sure how we could get to that point: the only thread which is allowed to touch the traced child's state is the ptracing parent, and the ptracing parent can't do anything until after vfp_set() has returned. Even if the traced child gets a fatal signal, the point at which the signal is delivered to the child is when returning to userspace, which means the child must be runnable. That in turn means that we are not in the middle of changing the register state. So, I'm not sure where Will got the idea that the partially copied state would be visible: it shouldn't ever be visible. If it is visible, then we've got problems even with Will's patch applied, because a partial copy whenever it happens would be visible. In any case, the above change is wrong: vfp_flush_hwstate() sets thread->vfpstate.hard.cpu to an invalid CPU number. Switching the order as Will has done _undoes_ the effect of vfp_flush_hwstate(), which leads to the bug you are seeing. So, I think the correct approach is to revert it. If the partially copied state _is_ visible somehow, that needs to be better documented - the original commit fails to indicate how the partially copied state becomes visible as a result of a context switch. > > Thanks. > > arch/arm/kernel/ptrace.c | 2 +- > 1 files changed, 1 insertion(+), 1 deletion(-) > > diff --git a/arch/arm/kernel/ptrace.c b/arch/arm/kernel/ptrace.c > index ef9119f7462e..4d9375814b53 100644 > --- a/arch/arm/kernel/ptrace.c > +++ b/arch/arm/kernel/ptrace.c > @@ -733,8 +733,8 @@ static int vfp_set(struct task_struct *target, > if (ret) > return ret; > > - vfp_flush_hwstate(thread); > thread->vfpstate.hard = new_vfp; > + vfp_flush_hwstate(thread); > > return 0; > } > > > -- > RMK's Patch system: http://www.armlinux.org.uk/developer/patches/ > FTTC broadband for 0.8mile line: currently at 9.6Mbps down 400kbps up > according to speedtest.net. > > ___ > linux-arm-kernel mailing list > linux-arm-ker...@lists.infradead.org > http://lists.infradead.org/mailman/listinfo/linux-arm-kernel -- RMK's Patch system: http://www.armlinux.org.uk/developer/patches/ FTTC broadband for 0.8mile line: currently at 9.6Mbps down 400kbps up according to speedtest.net.
Re: Possible race between PTRACE_SETVFPREGS and PTRACE_CONT on ARM?
On Mon, May 30, 2016 at 01:48:11PM -0400, Simon Marchi wrote: > Hello knowledgeable ARM people! > > (Background: https://sourceware.org/ml/gdb/2016-05/msg00020.html ) > > Debugging a flaky GDB test case on ARM lead me to think there might > be race between PTRACE_SETVFPREGS and PTRACE_CONT on ARM > (PTRACE_SETVFPREGS is ARM-specific anyway). The test case (and the > reproducer below) changes the value of a VFP register (let's say d0) > using PTRACE_SETVFPREGS and resumes the thread with PTRACE_CONT. It > happens intermittently that the thread resumes execution with the > old value in d0 instead of the new one. So, I thought I'd look into this, and what I see here on my systems (whether it be Marvell Dove or iMX6) is that the program always exits with a return code of 1. Investigation on the Marvell Dove platform leads me to conclude that Ubuntu 14.04 gdb (7.7.1-0ubuntu5~14.04.2) is built without support for VFP - if I add an "info float" into the gdb script, I get: Breakpoint 1, break_here () at test.S:8 8 vmrs APSR_nzcv, fpscr No floating-point info available for this processor. which is incredibly annoying, because it means that your "p $d0 = 4.0" line has no effect on the VFP state - hence why its always exiting with 1. However, if we look closer, we see that gdb has decided to put the breakpoint _after_ the comparison instruction, as confirmed by the disassembly after the breakpoint is hit: 0x8108 <+0>: vcmp.f64d0, d1 => 0x810c <+4>: vmrsAPSR_nzcv, fpscr 0x8110 <+8>: moveq r0, #1 On iMX6, where I have Ubuntu 12.04 gdb (7.4-2012.04-0ubuntu2.1), "info float" works as one expects, but we still end up with the program exiting with a code of 1 - every time - because again, the breakpoint is misplaced. So, the gdb verisons I have here seem to be particularly poor - but with some modifications, I can test out on iMX6 by forcing gdb to do the right thing - by inserting a couple of "mov r0, r0" instructions after the "break_here" label. With that, on a single CPU, it seems to work correctly every time, but if I bring up a secondary CPU I start seeing the same problems you've reported - which seems to need the following patch to solve. Please can you check whether this resolves your problem? Thanks. arch/arm/kernel/ptrace.c | 2 +- 1 files changed, 1 insertion(+), 1 deletion(-) diff --git a/arch/arm/kernel/ptrace.c b/arch/arm/kernel/ptrace.c index ef9119f7462e..4d9375814b53 100644 --- a/arch/arm/kernel/ptrace.c +++ b/arch/arm/kernel/ptrace.c @@ -733,8 +733,8 @@ static int vfp_set(struct task_struct *target, if (ret) return ret; - vfp_flush_hwstate(thread); thread->vfpstate.hard = new_vfp; + vfp_flush_hwstate(thread); return 0; } -- RMK's Patch system: http://www.armlinux.org.uk/developer/patches/ FTTC broadband for 0.8mile line: currently at 9.6Mbps down 400kbps up according to speedtest.net.
Possible race between PTRACE_SETVFPREGS and PTRACE_CONT on ARM?
Hello knowledgeable ARM people! (Background: https://sourceware.org/ml/gdb/2016-05/msg00020.html ) Debugging a flaky GDB test case on ARM lead me to think there might be race between PTRACE_SETVFPREGS and PTRACE_CONT on ARM (PTRACE_SETVFPREGS is ARM-specific anyway). The test case (and the reproducer below) changes the value of a VFP register (let's say d0) using PTRACE_SETVFPREGS and resumes the thread with PTRACE_CONT. It happens intermittently that the thread resumes execution with the old value in d0 instead of the new one. Here is a minimal reproducing example. test.S: .global _start _start: vldr.64 d0, constant vldr.64 d1, constant break_here: vcmp.f64 d0, d1 vmrs APSR_nzcv, fpscr # Exit code moveq r0, #1 movne r0, #0 # Exit syscall mov r7, #1 svc 0 .align 8 constant: .word 0xc8b43958 .word 0x40594676 Built with: $ gcc -g3 -O0 -o test test.S -nostdlib And the gdb script, test.gdb: file test b break_here run p $d0 = 4.0 c The test is ran with $ ./gdb -nx -x test.gdb -batch The test loads the same constant in d0 and d1. It then does a comparison between them and exits with 1 (failure) if they are the same, 0 (success) if they are different. The GDB script breaks at "break_here", tries to change the value of d0 to some other constant (4.0) and lets the program continue and exit. If our register write succeeded, the program should exit with 0 (values are different). If our register write failed, the program will exit with 1 (values are still the same). The result is that I randomly see both cases, hinting to a race between the register write and the time where the kernel restores the thread's vfp registers. Note that when GDB's affinity is pinned to a single core, I do not see the failure. Also, note that when I remove the vldr.64 instructions, I can't seem to reproduce the problem, so it looks like they are somehow important. I see this behavior on 3 different boards: - ODroid XU-4, kernel 3.10.96 - Firefly RK3288, kernel 3.10.0 - Raspberry Pi 2, kernel 4.4.8 Any ideas about this problem? Thanks, Simon