subject:"Possible race between PTRACE_SETVFPREGS and PTRACE

Re: Possible race between PTRACE_SETVFPREGS and PTRACE_CONT on ARM?

2016-06-02 Thread Simon Marchi

On 16-06-02 09:15 AM, Russell King - ARM Linux wrote:
> Hi, can I add a:
> 
> Tested-by: Simon Marchi 
> 
> tag to the commit please?

Yes, of course.

Re: Possible race between PTRACE_SETVFPREGS and PTRACE_CONT on ARM?

2016-06-02 Thread Russell King - ARM Linux

On Wed, Jun 01, 2016 at 08:54:05AM -0400, Simon Marchi wrote:
> On 16-05-30 05:35 PM, Russell King - ARM Linux wrote:
> > So, the gdb verisons I have here seem to be particularly poor - but with
> > some modifications, I can test out on iMX6 by forcing gdb to do the right
> > thing - by inserting a couple of "mov r0, r0" instructions after the
> > "break_here" label.
> 
> I see that problem too with older versions, bisecting shows it has been fixed
> in commit
> 
>   6e22494e5076 Do not skip prologue for asm (.S) files
> 
> in gdb, which is included in gdb 7.10 and up.
> 
> > With that, on a single CPU, it seems to work correctly every time, but
> > if I bring up a secondary CPU I start seeing the same problems you've
> > reported - which seems to need the following patch to solve.  Please can
> > you check whether this resolves your problem?
> 
> Yes that fixes the problem, the test case succeeds every time.  I have stared
> at those lines in ptrace.c for some time, but couldn't find the problem.  
> Thanks
> for looking into it!

Hi, can I add a:

Tested-by: Simon Marchi 

tag to the commit please?

Many thanks.

-- 
RMK's Patch system: http://www.armlinux.org.uk/developer/patches/
FTTC broadband for 0.8mile line: currently at 9.6Mbps down 400kbps up
according to speedtest.net.

Re: Possible race between PTRACE_SETVFPREGS and PTRACE_CONT on ARM?

2016-06-01 Thread Simon Marchi

On 16-05-30 05:35 PM, Russell King - ARM Linux wrote:
> So, the gdb verisons I have here seem to be particularly poor - but with
> some modifications, I can test out on iMX6 by forcing gdb to do the right
> thing - by inserting a couple of "mov r0, r0" instructions after the
> "break_here" label.

I see that problem too with older versions, bisecting shows it has been fixed
in commit

  6e22494e5076 Do not skip prologue for asm (.S) files

in gdb, which is included in gdb 7.10 and up.

> With that, on a single CPU, it seems to work correctly every time, but
> if I bring up a secondary CPU I start seeing the same problems you've
> reported - which seems to need the following patch to solve.  Please can
> you check whether this resolves your problem?

Yes that fixes the problem, the test case succeeds every time.  I have stared
at those lines in ptrace.c for some time, but couldn't find the problem.  Thanks
for looking into it!

Re: Possible race between PTRACE_SETVFPREGS and PTRACE_CONT on ARM?

2016-05-31 Thread Russell King - ARM Linux

On Tue, May 31, 2016 at 02:52:52PM +0100, Will Deacon wrote:
> The only thing I'm uncertain of is whether or not PTRACE_SEIZE/PTRACE_LISTEN
> allow switching to the child (but even then, how is the parent doing
> to issue such a request?).

I can't see how that would be possible - the parent needs to finish
the vfp_set() call before it can issue any others - and there can
only be one tracer.  IOW, if we're in vfp_set(), the thread must
have attached to the child, at which point PTRACE_SEIZE will be
rejected.

PTRACE_LISTEN also requires the child to be in STOP state as well, and
again, can only be issued from the current tracer.

-- 
RMK's Patch system: http://www.armlinux.org.uk/developer/patches/
FTTC broadband for 0.8mile line: currently at 9.6Mbps down 400kbps up
according to speedtest.net.

Re: Possible race between PTRACE_SETVFPREGS and PTRACE_CONT on ARM?

2016-05-31 Thread Will Deacon

On Mon, May 30, 2016 at 11:40:28PM +0100, Russell King - ARM Linux wrote:
> On Mon, May 30, 2016 at 10:35:29PM +0100, Russell King - ARM Linux wrote:
> > With that, on a single CPU, it seems to work correctly every time, but
> > if I bring up a secondary CPU I start seeing the same problems you've
> > reported - which seems to need the following patch to solve.  Please can
> > you check whether this resolves your problem?
> 
> More background behind this patch... I think that commit 8130b9d7b9d8
> ("ARM: 7308/1: vfp: flush thread hwstate before copying ptrace registers")
> was wrong.
> 
> Let's look at what's happening.  The above commit had the following effect:
> 
>   vfp_sync_hwstate();
>   new_vfp = thread->vfpstate.hard;
>   modify new_vfp (but not new_vfp.cpu)
> + vfp_flush_hwstate(thread);
>   thread->vfpstate.hard = new_vfp;
> - vfp_flush_hwstate(thread);
> 
> Now, the commit above claims that a context switch mid-copy of new_vfp
> into thread->vfpstate.hard may result in corrupted state.
> 
> I'm not quite sure how we could get to that point: the only thread which
> is allowed to touch the traced child's state is the ptracing parent, and
> the ptracing parent can't do anything until after vfp_set() has returned.
> Even if the traced child gets a fatal signal, the point at which the
> signal is delivered to the child is when returning to userspace, which
> means the child must be runnable.  That in turn means that we are not
> in the middle of changing the register state.
> 
> So, I'm not sure where Will got the idea that the partially copied state
> would be visible: it shouldn't ever be visible.  If it is visible, then
> we've got problems even with Will's patch applied, because a partial
> copy whenever it happens would be visible.
> 
> In any case, the above change is wrong: vfp_flush_hwstate() sets
> thread->vfpstate.hard.cpu to an invalid CPU number.  Switching the order
> as Will has done _undoes_ the effect of vfp_flush_hwstate(), which leads
> to the bug you are seeing.
> 
> So, I think the correct approach is to revert it.

Yes, it looks like I was over-zealous in fixing all users of
vfp_flush_hwstate after I fixed the signal handling code. Given that the
child isn't runnable until the parent has finished poking at the register
state, the original code looks fine.

The only thing I'm uncertain of is whether or not PTRACE_SEIZE/PTRACE_LISTEN
allow switching to the child (but even then, how is the parent doing
to issue such a request?).

Will

Re: Possible race between PTRACE_SETVFPREGS and PTRACE_CONT on ARM?

2016-05-30 Thread Russell King - ARM Linux

On Mon, May 30, 2016 at 10:35:29PM +0100, Russell King - ARM Linux wrote:
> With that, on a single CPU, it seems to work correctly every time, but
> if I bring up a secondary CPU I start seeing the same problems you've
> reported - which seems to need the following patch to solve.  Please can
> you check whether this resolves your problem?

More background behind this patch... I think that commit 8130b9d7b9d8
("ARM: 7308/1: vfp: flush thread hwstate before copying ptrace registers")
was wrong.

Let's look at what's happening.  The above commit had the following effect:

vfp_sync_hwstate();
new_vfp = thread->vfpstate.hard;
modify new_vfp (but not new_vfp.cpu)
+   vfp_flush_hwstate(thread);
thread->vfpstate.hard = new_vfp;
-   vfp_flush_hwstate(thread);

Now, the commit above claims that a context switch mid-copy of new_vfp
into thread->vfpstate.hard may result in corrupted state.

I'm not quite sure how we could get to that point: the only thread which
is allowed to touch the traced child's state is the ptracing parent, and
the ptracing parent can't do anything until after vfp_set() has returned.
Even if the traced child gets a fatal signal, the point at which the
signal is delivered to the child is when returning to userspace, which
means the child must be runnable.  That in turn means that we are not
in the middle of changing the register state.

So, I'm not sure where Will got the idea that the partially copied state
would be visible: it shouldn't ever be visible.  If it is visible, then
we've got problems even with Will's patch applied, because a partial
copy whenever it happens would be visible.

In any case, the above change is wrong: vfp_flush_hwstate() sets
thread->vfpstate.hard.cpu to an invalid CPU number.  Switching the order
as Will has done _undoes_ the effect of vfp_flush_hwstate(), which leads
to the bug you are seeing.

So, I think the correct approach is to revert it.

If the partially copied state _is_ visible somehow, that needs to be
better documented - the original commit fails to indicate how the
partially copied state becomes visible as a result of a context switch.

> 
> Thanks.
> 
>  arch/arm/kernel/ptrace.c | 2 +-
>  1 files changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/arch/arm/kernel/ptrace.c b/arch/arm/kernel/ptrace.c
> index ef9119f7462e..4d9375814b53 100644
> --- a/arch/arm/kernel/ptrace.c
> +++ b/arch/arm/kernel/ptrace.c
> @@ -733,8 +733,8 @@ static int vfp_set(struct task_struct *target,
>   if (ret)
>   return ret;
>  
> - vfp_flush_hwstate(thread);
>   thread->vfpstate.hard = new_vfp;
> + vfp_flush_hwstate(thread);
>  
>   return 0;
>  }
> 
> 
> -- 
> RMK's Patch system: http://www.armlinux.org.uk/developer/patches/
> FTTC broadband for 0.8mile line: currently at 9.6Mbps down 400kbps up
> according to speedtest.net.
> 
> ___
> linux-arm-kernel mailing list
> linux-arm-ker...@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

-- 
RMK's Patch system: http://www.armlinux.org.uk/developer/patches/
FTTC broadband for 0.8mile line: currently at 9.6Mbps down 400kbps up
according to speedtest.net.

Re: Possible race between PTRACE_SETVFPREGS and PTRACE_CONT on ARM?

2016-05-30 Thread Russell King - ARM Linux

On Mon, May 30, 2016 at 01:48:11PM -0400, Simon Marchi wrote:
> Hello knowledgeable ARM people!
> 
> (Background: https://sourceware.org/ml/gdb/2016-05/msg00020.html )
> 
> Debugging a flaky GDB test case on ARM lead me to think there might
> be race between PTRACE_SETVFPREGS and PTRACE_CONT on ARM
> (PTRACE_SETVFPREGS is ARM-specific anyway).  The test case (and the
> reproducer below) changes the value of a VFP register (let's say d0)
> using PTRACE_SETVFPREGS and resumes the thread with PTRACE_CONT.  It
> happens intermittently that the thread resumes execution with the
> old value in d0 instead of the new one.

So, I thought I'd look into this, and what I see here on my systems
(whether it be Marvell Dove or iMX6) is that the program always exits
with a return code of 1.

Investigation on the Marvell Dove platform leads me to conclude that
Ubuntu 14.04 gdb (7.7.1-0ubuntu5~14.04.2) is built without support for
VFP - if I add an "info float" into the gdb script, I get:

Breakpoint 1, break_here () at test.S:8
8 vmrs APSR_nzcv, fpscr
No floating-point info available for this processor.

which is incredibly annoying, because it means that your "p $d0 = 4.0"
line has no effect on the VFP state - hence why its always exiting with
1.

However, if we look closer, we see that gdb has decided to put the
breakpoint _after_ the comparison instruction, as confirmed by the
disassembly after the breakpoint is hit:

   0x8108 <+0>: vcmp.f64d0, d1
=> 0x810c <+4>: vmrsAPSR_nzcv, fpscr
   0x8110 <+8>: moveq   r0, #1

On iMX6, where I have Ubuntu 12.04 gdb (7.4-2012.04-0ubuntu2.1), "info
float" works as one expects, but we still end up with the program
exiting with a code of 1 - every time - because again, the breakpoint
is misplaced.

So, the gdb verisons I have here seem to be particularly poor - but with
some modifications, I can test out on iMX6 by forcing gdb to do the right
thing - by inserting a couple of "mov r0, r0" instructions after the
"break_here" label.

With that, on a single CPU, it seems to work correctly every time, but
if I bring up a secondary CPU I start seeing the same problems you've
reported - which seems to need the following patch to solve.  Please can
you check whether this resolves your problem?

Thanks.

 arch/arm/kernel/ptrace.c | 2 +-
 1 files changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/arm/kernel/ptrace.c b/arch/arm/kernel/ptrace.c
index ef9119f7462e..4d9375814b53 100644
--- a/arch/arm/kernel/ptrace.c
+++ b/arch/arm/kernel/ptrace.c
@@ -733,8 +733,8 @@ static int vfp_set(struct task_struct *target,
if (ret)
return ret;

-   vfp_flush_hwstate(thread);
thread->vfpstate.hard = new_vfp;
+   vfp_flush_hwstate(thread);

return 0;
 }

-- 
RMK's Patch system: http://www.armlinux.org.uk/developer/patches/
FTTC broadband for 0.8mile line: currently at 9.6Mbps down 400kbps up
according to speedtest.net.

Possible race between PTRACE_SETVFPREGS and PTRACE_CONT on ARM?

2016-05-30 Thread Simon Marchi

Hello knowledgeable ARM people!

(Background: https://sourceware.org/ml/gdb/2016-05/msg00020.html )

Debugging a flaky GDB test case on ARM lead me to think there might
be race between PTRACE_SETVFPREGS and PTRACE_CONT on ARM
(PTRACE_SETVFPREGS is ARM-specific anyway).  The test case (and the
reproducer below) changes the value of a VFP register (let's say d0)
using PTRACE_SETVFPREGS and resumes the thread with PTRACE_CONT.  It
happens intermittently that the thread resumes execution with the
old value in d0 instead of the new one.

Here is a minimal reproducing example.

test.S:

  .global _start
  _start:
  vldr.64 d0, constant
  vldr.64 d1, constant

  break_here:
  vcmp.f64 d0, d1
  vmrs APSR_nzcv, fpscr

  # Exit code
  moveq r0, #1
  movne r0, #0

  # Exit syscall
  mov r7, #1
  svc 0

  .align 8
  constant:
  .word 0xc8b43958
  .word 0x40594676

Built with:

  $ gcc -g3 -O0 -o test test.S -nostdlib

And the gdb script, test.gdb:

  file test
  b break_here
  run
  p $d0 = 4.0
  c

The test is ran with

  $ ./gdb -nx -x test.gdb -batch

The test loads the same constant in d0 and d1.  It then does a comparison 
between
them and exits with 1 (failure) if they are the same, 0 (success) if they are 
different.
The GDB script breaks at "break_here", tries to change the value of d0 to some 
other
constant (4.0) and lets the program continue and exit.  If our register write 
succeeded,
the program should exit with 0 (values are different).  If our register write 
failed, the
program will exit with 1 (values are still the same).

The result is that I randomly see both cases, hinting to a race between the 
register write
and the time where the kernel restores the thread's vfp registers.  Note that 
when GDB's
affinity is pinned to a single core, I do not see the failure.  Also, note that 
when I
remove the vldr.64 instructions, I can't seem to reproduce the problem, so it 
looks
like they are somehow important.

I see this behavior on 3 different boards:

- ODroid XU-4, kernel 3.10.96
- Firefly RK3288, kernel 3.10.0
- Raspberry Pi 2, kernel 4.4.8

Any ideas about this problem?

Thanks,

Simon

Re: Possible race between PTRACE_SETVFPREGS and PTRACE_CONT on ARM?

Re: Possible race between PTRACE_SETVFPREGS and PTRACE_CONT on ARM?

Re: Possible race between PTRACE_SETVFPREGS and PTRACE_CONT on ARM?

Re: Possible race between PTRACE_SETVFPREGS and PTRACE_CONT on ARM?

Re: Possible race between PTRACE_SETVFPREGS and PTRACE_CONT on ARM?

Re: Possible race between PTRACE_SETVFPREGS and PTRACE_CONT on ARM?

Re: Possible race between PTRACE_SETVFPREGS and PTRACE_CONT on ARM?

Possible race between PTRACE_SETVFPREGS and PTRACE_CONT on ARM?

8 matches

Site Navigation

Mail list logo

Footer information