Re: powerpc: syscall_dotrace() retcode (Was: powerpc: fork stepping)

2009-12-01 Thread Roland McGrath
We don't really intend to impose any new requirements on the arch behavior
here.  It's up to the arch folks to decide.  It does simplify life somewhat
on x86 that you can change the registers at the syscall-entry stop and then
after skipping the syscall, those registers will be unchanged from what you
set.  But it's never been that way on every other arch, and it doesn't need
to be.  The arch requirement on the tracehook_report_syscall_entry() return
value handling is that it make the syscall not be run, and that the
register state then left be compatible with using syscall_rollback() at the
tracehook_report_syscall_exit() spot.

As to what you get from ptrace explicitly fiddling with registers at
syscall entry, that has always been arch-specific before and we haven't
done anything to change that situation.  On every arch, there is some way
to change the syscall number to be run, and changing it to a known-bogus
number (e.g. -1) makes sure no syscall runs.  On every arch, it's possible
at the tracehook_report_syscall_exit() spot to change the registers to
exactly whatever you want userland to see.  That's enough as it stands to
make it possible to do whatever you want, some way or other.

If the powerpc maintainers want to change the behavior here, that is fine
by me.  But there is no need for that just to satisfy general ptrace
cleanups (or utrace).  Normal concerns require that no such change break
the ptrace behavior that userland could have relied on in the past.

So off hand I don't see a reason to change at all.  If every arch were to
change so that registers changed at syscall-entry were left unmolested by
aborting the syscall, then that might be a new consistency worth having.
But short of that, I don't really see a benefit.

All this implies that the ptrace-tests case relating to this needs to be
tailored differently for powerpc and each other arch so it expects and
verifies exactly the arch-specific behavior that's been seen in the past.


Thanks,
Roland



Re: powerpc: syscall_dotrace() retcode (Was: powerpc: fork stepping)

2009-12-01 Thread Benjamin Herrenschmidt
On Tue, 2009-12-01 at 11:27 -0800, Roland McGrath wrote:
 
 If the powerpc maintainers want to change the behavior here, that is fine
 by me.  But there is no need for that just to satisfy general ptrace
 cleanups (or utrace).  Normal concerns require that no such change break
 the ptrace behavior that userland could have relied on in the past.
 
 So off hand I don't see a reason to change at all.  If every arch were to
 change so that registers changed at syscall-entry were left unmolested by
 aborting the syscall, then that might be a new consistency worth having.
 But short of that, I don't really see a benefit.
 
 All this implies that the ptrace-tests case relating to this needs to be
 tailored differently for powerpc and each other arch so it expects and
 verifies exactly the arch-specific behavior that's been seen in the past. 

Ok thanks. I'm happy to not change it then, the risk of breaking some
existing assumption is too high in my book.

Cheers,
Ben.




Re: powerpc: syscall_dotrace() retcode (Was: powerpc: fork stepping)

2009-11-30 Thread Oleg Nesterov
On 11/30, Benjamin Herrenschmidt wrote:

 On Mon, 2009-11-30 at 10:15 +1100, Benjamin Herrenschmidt wrote:

  Yes, the asm should be changed. I suppose we could check if the result
  of do_syscall_trace_enter is negative, and if it is, branch to the exit
  path using r3 as the error code. Would that be ok ?
 
  Something like this:

 Note however that there's a trace exit too and that's normally the right
 place to alter the result don't you think ?

Yes, the result can be changed when the tracee reports syscall-exit.

Should powerpc allow this on syscall-entry? I do not know. x86 does,
and we have this test-case which assumes powerpc should allow too.
But when it comes to ptrace I can almost never know what was the
supposed behaviour/api.

Jan, Roland, what do you think?

Oleg.



Re: powerpc: syscall_dotrace() retcode (Was: powerpc: fork stepping)

2009-11-30 Thread Oleg Nesterov
On 11/30, Benjamin Herrenschmidt wrote:

 On Sun, 2009-11-29 at 22:07 +0100, Oleg Nesterov wrote:
 
  If I change the test-case to use NEWVAL == 1000 (or any other value
  greater than NR_syscalls), then the tracee sees ENOSYS and this is
  correct too.
 
  But I do not see how it is possible to change the retcode on powerpc.
  Unlike x86, powepc doesn't set -ENOSYS in advance, before doing
  do_syscall_trace_enter() logic. This means that if the tracer cancels
  syscall, r3 will be overwritten by syscall_enosys.
 
  This probably means the kernel should be fixed too, but I am not
  brave enough to change the asm which I can't understand ;)

 Yes, the asm should be changed. I suppose we could check if the result
 of do_syscall_trace_enter is negative, and if it is, branch to the exit
 path using r3 as the error code. Would that be ok ?

 Something like this:

 --- a/arch/powerpc/kernel/entry_64.S
 +++ b/arch/powerpc/kernel/entry_64.S
 @@ -240,6 +240,9 @@ syscall_dotrace:
   bl  .save_nvgprs
   addir3,r1,STACK_FRAME_OVERHEAD
   bl  .do_syscall_trace_enter
 + cmpdi   cr0,r3,0
 + blt syscall_exit
 +

Yes, but this doesn't allow to a) cancel this syscall and b) make it
return a non-negative result to the tracee.

Perhaps poweprc should set pt_regs-result = -ENOSYS before calling
do_syscall_trace_enter() like x86 does ? (in this case syscall_exit()
shouldn't change RESULT(r1)). This way the tracer can change both
pt_regs-result and gpr[0] independently.

Oleg.



Re: powerpc: syscall_dotrace() retcode (Was: powerpc: fork stepping)

2009-11-29 Thread Benjamin Herrenschmidt
On Sun, 2009-11-29 at 22:07 +0100, Oleg Nesterov wrote:
 On 11/28, Ananth N Mavinakayanahalli wrote:
 
  syscall-reset is the only failure I see on
  powerpc:
 
  errno 14 (Bad address)
  syscall-reset: syscall-reset.c:95: main: Assertion `(*__errno_location
  ()) == 38' failed.
  unexpected child status 67f
  FAIL: syscall-reset
 
 (to remind, it also fails without utrace)
 
 Once again, I know nothing about powerc, perhaps I misread the code,
 but I believe this test-case is just wrong on powerpc and should be
 fixed.
 
 On powerpc, syscall_get_nr() returns regs-gpr[0], this means this
 register is used to pass the syscall number.

Correct.

 This matches do_syscall_trace_enter(), it returns regs-gpr[0] as a
 (possibly changed by tracer) syscall nr.
 
 arch/powerpc/kernel/entry_64.S does
 
   syscall_dotrace:
 
bl  .do_syscall_trace_enter
mr  r0,r3  // I guess, r3 = r0 ?

r3 is the return value from a function so this replaces the
syscall number
...
b   syscall_dotrace_cont
 
   syscall_dotrace_cont:
 
   syscall_dotrace_cont:
 
   cmpldi  0,r0,NR_syscalls
   bge-syscall_enosys
 
   syscall_enosys:
 
   li  r3,-ENOSYS
   b   syscall_exit
 
 
 Now return to the test-case, syscall-reset.c. The tracee does
 l = syscall (-23, 1, 2, 3) and stops.
 
 The tracer does
 
   #define RETREG  offsetof(struct pt_regs, gpr[0])
   #define NEWVAL  ((long) ENOTTY)
 
   l = ptrace(PTRACE_PEEKUSER, child, RETREG, 0l);
 
 l == -23, this is correct, note syscall(-23) above.
 
   l = ptrace(PTRACE_POKEUSER, child, RETREG, NEWVAL);
 
 And expects the tracee will see NEWVAL==ENOTTY after return from
 the systame call.
 
 Of course this can't happen. We changed the syscall number, the
 new value is ENOTTY == 25 == __NR_stime, sys_stime() correctly
 returns -EFAULT.
 
 -
 
 If I change the test-case to use NEWVAL == 1000 (or any other value
 greater than NR_syscalls), then the tracee sees ENOSYS and this is
 correct too.
 
 But I do not see how it is possible to change the retcode on powerpc.
 Unlike x86, powepc doesn't set -ENOSYS in advance, before doing
 do_syscall_trace_enter() logic. This means that if the tracer cancels
 syscall, r3 will be overwritten by syscall_enosys.
 
 This probably means the kernel should be fixed too, but I am not
 brave enough to change the asm which I can't understand ;)

Yes, the asm should be changed. I suppose we could check if the result
of do_syscall_trace_enter is negative, and if it is, branch to the exit
path using r3 as the error code. Would that be ok ?

Something like this:

diff --git a/arch/powerpc/kernel/entry_32.S b/arch/powerpc/kernel/entry_32.S
index 1175a85..7a88c88 100644
--- a/arch/powerpc/kernel/entry_32.S
+++ b/arch/powerpc/kernel/entry_32.S
@@ -419,6 +419,9 @@ syscall_dotrace:
stw r0,_TRAP(r1)
addir3,r1,STACK_FRAME_OVERHEAD
bl  do_syscall_trace_enter
+   cmpwi   cr0,r3,0
+   blt ret_from_syscall
+
/*
 * Restore argument registers possibly just changed.
 * We use the return value of do_syscall_trace_enter
diff --git a/arch/powerpc/kernel/entry_64.S b/arch/powerpc/kernel/entry_64.S
index 9763267..ec709a7 100644
--- a/arch/powerpc/kernel/entry_64.S
+++ b/arch/powerpc/kernel/entry_64.S
@@ -240,6 +240,9 @@ syscall_dotrace:
bl  .save_nvgprs
addir3,r1,STACK_FRAME_OVERHEAD
bl  .do_syscall_trace_enter
+   cmpdi   cr0,r3,0
+   blt syscall_exit
+
/*
 * Restore argument registers possibly just changed.
 * We use the return value of do_syscall_trace_enter


Cheers,
Ben.