Re: powerpc: syscall_dotrace() retcode (Was: powerpc: fork stepping)
We don't really intend to impose any new requirements on the arch behavior here. It's up to the arch folks to decide. It does simplify life somewhat on x86 that you can change the registers at the syscall-entry stop and then after skipping the syscall, those registers will be unchanged from what you set. But it's never been that way on every other arch, and it doesn't need to be. The arch requirement on the tracehook_report_syscall_entry() return value handling is that it make the syscall not be run, and that the register state then left be compatible with using syscall_rollback() at the tracehook_report_syscall_exit() spot. As to what you get from ptrace explicitly fiddling with registers at syscall entry, that has always been arch-specific before and we haven't done anything to change that situation. On every arch, there is some way to change the syscall number to be run, and changing it to a known-bogus number (e.g. -1) makes sure no syscall runs. On every arch, it's possible at the tracehook_report_syscall_exit() spot to change the registers to exactly whatever you want userland to see. That's enough as it stands to make it possible to do whatever you want, some way or other. If the powerpc maintainers want to change the behavior here, that is fine by me. But there is no need for that just to satisfy general ptrace cleanups (or utrace). Normal concerns require that no such change break the ptrace behavior that userland could have relied on in the past. So off hand I don't see a reason to change at all. If every arch were to change so that registers changed at syscall-entry were left unmolested by aborting the syscall, then that might be a new consistency worth having. But short of that, I don't really see a benefit. All this implies that the ptrace-tests case relating to this needs to be tailored differently for powerpc and each other arch so it expects and verifies exactly the arch-specific behavior that's been seen in the past. Thanks, Roland
Re: powerpc: syscall_dotrace() retcode (Was: powerpc: fork stepping)
On Tue, 2009-12-01 at 11:27 -0800, Roland McGrath wrote: If the powerpc maintainers want to change the behavior here, that is fine by me. But there is no need for that just to satisfy general ptrace cleanups (or utrace). Normal concerns require that no such change break the ptrace behavior that userland could have relied on in the past. So off hand I don't see a reason to change at all. If every arch were to change so that registers changed at syscall-entry were left unmolested by aborting the syscall, then that might be a new consistency worth having. But short of that, I don't really see a benefit. All this implies that the ptrace-tests case relating to this needs to be tailored differently for powerpc and each other arch so it expects and verifies exactly the arch-specific behavior that's been seen in the past. Ok thanks. I'm happy to not change it then, the risk of breaking some existing assumption is too high in my book. Cheers, Ben.
Re: powerpc: syscall_dotrace() retcode (Was: powerpc: fork stepping)
On 11/30, Benjamin Herrenschmidt wrote: On Mon, 2009-11-30 at 10:15 +1100, Benjamin Herrenschmidt wrote: Yes, the asm should be changed. I suppose we could check if the result of do_syscall_trace_enter is negative, and if it is, branch to the exit path using r3 as the error code. Would that be ok ? Something like this: Note however that there's a trace exit too and that's normally the right place to alter the result don't you think ? Yes, the result can be changed when the tracee reports syscall-exit. Should powerpc allow this on syscall-entry? I do not know. x86 does, and we have this test-case which assumes powerpc should allow too. But when it comes to ptrace I can almost never know what was the supposed behaviour/api. Jan, Roland, what do you think? Oleg.
Re: powerpc: syscall_dotrace() retcode (Was: powerpc: fork stepping)
On 11/30, Benjamin Herrenschmidt wrote: On Sun, 2009-11-29 at 22:07 +0100, Oleg Nesterov wrote: If I change the test-case to use NEWVAL == 1000 (or any other value greater than NR_syscalls), then the tracee sees ENOSYS and this is correct too. But I do not see how it is possible to change the retcode on powerpc. Unlike x86, powepc doesn't set -ENOSYS in advance, before doing do_syscall_trace_enter() logic. This means that if the tracer cancels syscall, r3 will be overwritten by syscall_enosys. This probably means the kernel should be fixed too, but I am not brave enough to change the asm which I can't understand ;) Yes, the asm should be changed. I suppose we could check if the result of do_syscall_trace_enter is negative, and if it is, branch to the exit path using r3 as the error code. Would that be ok ? Something like this: --- a/arch/powerpc/kernel/entry_64.S +++ b/arch/powerpc/kernel/entry_64.S @@ -240,6 +240,9 @@ syscall_dotrace: bl .save_nvgprs addir3,r1,STACK_FRAME_OVERHEAD bl .do_syscall_trace_enter + cmpdi cr0,r3,0 + blt syscall_exit + Yes, but this doesn't allow to a) cancel this syscall and b) make it return a non-negative result to the tracee. Perhaps poweprc should set pt_regs-result = -ENOSYS before calling do_syscall_trace_enter() like x86 does ? (in this case syscall_exit() shouldn't change RESULT(r1)). This way the tracer can change both pt_regs-result and gpr[0] independently. Oleg.
Re: powerpc: syscall_dotrace() retcode (Was: powerpc: fork stepping)
On Sun, 2009-11-29 at 22:07 +0100, Oleg Nesterov wrote: On 11/28, Ananth N Mavinakayanahalli wrote: syscall-reset is the only failure I see on powerpc: errno 14 (Bad address) syscall-reset: syscall-reset.c:95: main: Assertion `(*__errno_location ()) == 38' failed. unexpected child status 67f FAIL: syscall-reset (to remind, it also fails without utrace) Once again, I know nothing about powerc, perhaps I misread the code, but I believe this test-case is just wrong on powerpc and should be fixed. On powerpc, syscall_get_nr() returns regs-gpr[0], this means this register is used to pass the syscall number. Correct. This matches do_syscall_trace_enter(), it returns regs-gpr[0] as a (possibly changed by tracer) syscall nr. arch/powerpc/kernel/entry_64.S does syscall_dotrace: bl .do_syscall_trace_enter mr r0,r3 // I guess, r3 = r0 ? r3 is the return value from a function so this replaces the syscall number ... b syscall_dotrace_cont syscall_dotrace_cont: syscall_dotrace_cont: cmpldi 0,r0,NR_syscalls bge-syscall_enosys syscall_enosys: li r3,-ENOSYS b syscall_exit Now return to the test-case, syscall-reset.c. The tracee does l = syscall (-23, 1, 2, 3) and stops. The tracer does #define RETREG offsetof(struct pt_regs, gpr[0]) #define NEWVAL ((long) ENOTTY) l = ptrace(PTRACE_PEEKUSER, child, RETREG, 0l); l == -23, this is correct, note syscall(-23) above. l = ptrace(PTRACE_POKEUSER, child, RETREG, NEWVAL); And expects the tracee will see NEWVAL==ENOTTY after return from the systame call. Of course this can't happen. We changed the syscall number, the new value is ENOTTY == 25 == __NR_stime, sys_stime() correctly returns -EFAULT. - If I change the test-case to use NEWVAL == 1000 (or any other value greater than NR_syscalls), then the tracee sees ENOSYS and this is correct too. But I do not see how it is possible to change the retcode on powerpc. Unlike x86, powepc doesn't set -ENOSYS in advance, before doing do_syscall_trace_enter() logic. This means that if the tracer cancels syscall, r3 will be overwritten by syscall_enosys. This probably means the kernel should be fixed too, but I am not brave enough to change the asm which I can't understand ;) Yes, the asm should be changed. I suppose we could check if the result of do_syscall_trace_enter is negative, and if it is, branch to the exit path using r3 as the error code. Would that be ok ? Something like this: diff --git a/arch/powerpc/kernel/entry_32.S b/arch/powerpc/kernel/entry_32.S index 1175a85..7a88c88 100644 --- a/arch/powerpc/kernel/entry_32.S +++ b/arch/powerpc/kernel/entry_32.S @@ -419,6 +419,9 @@ syscall_dotrace: stw r0,_TRAP(r1) addir3,r1,STACK_FRAME_OVERHEAD bl do_syscall_trace_enter + cmpwi cr0,r3,0 + blt ret_from_syscall + /* * Restore argument registers possibly just changed. * We use the return value of do_syscall_trace_enter diff --git a/arch/powerpc/kernel/entry_64.S b/arch/powerpc/kernel/entry_64.S index 9763267..ec709a7 100644 --- a/arch/powerpc/kernel/entry_64.S +++ b/arch/powerpc/kernel/entry_64.S @@ -240,6 +240,9 @@ syscall_dotrace: bl .save_nvgprs addir3,r1,STACK_FRAME_OVERHEAD bl .do_syscall_trace_enter + cmpdi cr0,r3,0 + blt syscall_exit + /* * Restore argument registers possibly just changed. * We use the return value of do_syscall_trace_enter Cheers, Ben.