Re: powerpc: step-jump-cont failure (Was: [PATCH] utrace: don't set -ops = utrace_detached_ops lockless)
Thanks for pointing out. Sorry for the false alarm.
Re: powerpc: step-jump-cont failure (Was: [PATCH] utrace: don't set -ops = utrace_detached_ops lockless)
On Wed, 09 Dec 2009 19:12:41 +0100, Oleg Nesterov wrote: while the '.func_name' is the text address. tried to change the code to REGS_ACCESS (regs, nip) = (unsigned long) .raise_sigusr2 but gcc doesn't like this ;) ... Yes, I verified the patch below fixes step-jump-cont.c on ibm-js20-02.lab.bos.redhat.com. Checked-in a similar patch but same as used now in other testcases, sorry for not using the patch of yours. Regards, Jan --- step-jump-cont.c8 Dec 2008 18:23:41 - 1.12 +++ step-jump-cont.c14 Dec 2009 11:38:37 - 1.13 @@ -213,6 +213,24 @@ int main (void) REGS_ACCESS (regs, eip) = (unsigned long) raise_sigusr2; #elif defined __x86_64__ REGS_ACCESS (regs, rip) = (unsigned long) raise_sigusr2; +#elif defined __powerpc64__ + { +/* ppc64 `raise_sigusr2' resolves to the function descriptor. */ +union + { + void (*f) (void); + struct + { + void *entry; + void *toc; + } + *p; + } +const func_u = { raise_sigusr2 }; + +REGS_ACCESS (regs, nip) = (unsigned long) func_u.p-entry; +REGS_ACCESS (regs, gpr[2]) = (unsigned long) func_u.p-toc; + } #elif defined __powerpc__ REGS_ACCESS (regs, nip) = (unsigned long) raise_sigusr2; #else
Re: powerpc: step-jump-cont failure (Was: [PATCH] utrace: don't set -ops = utrace_detached_ops lockless)
On 12/08, Ananth N Mavinakayanahalli wrote: On Mon, Dec 07, 2009 at 07:05:40PM +0100, Oleg Nesterov wrote: On 12/07, Oleg Nesterov wrote: On 12/07, Jan Kratochvil wrote: On Mon, 07 Dec 2009 15:24:51 +0100, Oleg Nesterov wrote: But. raise_sigusr2 is not equal to the actual address of raise_sigusr2(), this value points to the thunk (I do not know the correct English term) ppc64 calls it function descriptor (GDB ppc64_linux_convert_from_func_ptr_addr): For PPC64, a function descriptor is a TOC entry, Thanks Jan. in a data section, Yes! Now I can't understand how this test-case could ever work on ppc. step-jump-cont does: regs-nip = raise_sigusr2; --- points to data section ptrace(PTRACE_CONT); of course, the tracee gets SIGSEGV, this section is not executable. Hmm. Looks like, powerpc means a lot of different hardware, and _PAGE_EXEC may be 0. I didn't notice this when I quickly grepped arch/powerpc/ IOW, perhaps on some machines r implies x ? Is yes, this can explain why the results differ on different machines. Well, powerpc 32-bit adheres to the SVR4 ABI, while powerpc 64-bit uses the PPC64-ELF ABI (http://refspecs.linuxfoundation.org/ELF/ppc64/). The 64bit ABI uses function descriptors and the 'func_name' is the data address, Cai, Ananth, thank you. So. I think we can forget about the possible kernel problems (and in any case we can rule out utrace). The test-case just wrong and should be fixed. The tracee can't execute the function descriptor in data section, that is why it gets SIGSEGV. while the '.func_name' is the text address. tried to change the code to REGS_ACCESS (regs, nip) = (unsigned long) .raise_sigusr2 but gcc doesn't like this ;) (See handle_rt_signal64 in arch/powerpc/kernel/signal_64.c and kprobe_lookup_name in arch/powerpc/include/asm/kprobes.h. Thanks... looking at handle_rt_signal64(), looks like we should also set regs-gpr[2] = funct_desc_ptr-toc if we change regs-nip I hope someone who understand powerpc could fix the test-case ;) Oleg.
Re: powerpc: step-jump-cont failure (Was: [PATCH] utrace: don't set -ops = utrace_detached_ops lockless)
On Mon, Dec 07, 2009 at 01:43:27PM +0100, Oleg Nesterov wrote: On 12/06, CAI Qian wrote: Ananth, could you please confirm once again that step-jump-cont (from ptrace-tests testsuite) not fail on your machine? If yes, please tell me the version of glibc/gcc. Is PTRACE_GETREGS defined on your machine? Hi Oleg, It works for me on a Fedora 12 machine. [ana...@mjs22lp1 ptrace-tests]$ gcc --version gcc (GCC) 4.4.2 20091027 (Red Hat 4.4.2-7) [ana...@mjs22lp1 ptrace-tests]$ rpm -qa |grep glibc glibc-common-2.11-2.ppc glibc-2.11-2.ppc64 glibc-devel-2.11-2.ppc glibc-static-2.11-2.ppc glibc-2.11-2.ppc glibc-devel-2.11-2.ppc64 glibc-headers-2.11-2.ppc And yes, PTRACE_GETREGS is defined in /usr/include/asm/ptrace.h Ananth
Re: powerpc: step-jump-cont failure (Was: [PATCH] utrace: don't set -ops = utrace_detached_ops lockless)
On Mon, Dec 07, 2009 at 07:05:40PM +0100, Oleg Nesterov wrote: On 12/07, Oleg Nesterov wrote: On 12/07, Jan Kratochvil wrote: On Mon, 07 Dec 2009 15:24:51 +0100, Oleg Nesterov wrote: But. raise_sigusr2 is not equal to the actual address of raise_sigusr2(), this value points to the thunk (I do not know the correct English term) ppc64 calls it function descriptor (GDB ppc64_linux_convert_from_func_ptr_addr): For PPC64, a function descriptor is a TOC entry, Thanks Jan. in a data section, Yes! Now I can't understand how this test-case could ever work on ppc. step-jump-cont does: regs-nip = raise_sigusr2; --- points to data section ptrace(PTRACE_CONT); of course, the tracee gets SIGSEGV, this section is not executable. Hmm. Looks like, powerpc means a lot of different hardware, and _PAGE_EXEC may be 0. I didn't notice this when I quickly grepped arch/powerpc/ IOW, perhaps on some machines r implies x ? Is yes, this can explain why the results differ on different machines. Well, powerpc 32-bit adheres to the SVR4 ABI, while powerpc 64-bit uses the PPC64-ELF ABI (http://refspecs.linuxfoundation.org/ELF/ppc64/). The 64bit ABI uses function descriptors and the 'func_name' is the data address, while the '.func_name' is the text address. (See handle_rt_signal64 in arch/powerpc/kernel/signal_64.c and kprobe_lookup_name in arch/powerpc/include/asm/kprobes.h. Ananth
Re: powerpc: step-jump-cont failure (Was: [PATCH] utrace: don't set -ops = utrace_detached_ops lockless)
I'll try to investigate, but currently I am all confused, and I suspect we have some user-space issues. If only I knew something about ppc... Sorry for the confusing. Ananth, could you please confirm once again that step-jump-cont (from ptrace-tests testsuite) not fail on your machine? If yes, please tell me the version of glibc/gcc. Is PTRACE_GETREGS defined on your machine? Funny enough. The above failure only seen on that particular system so far. In fact, different PPC64 systems have different results there (roland's git tree + your lockless patch). ibm-js20-02.lab.bos.redhat.com FAIL: watchpoint ppc-dabr-race: ./../tests/ppc-dabr-race.c:141: handler_fail: Assertion `0' failed. /bin/sh: line 5: 16928 Aborted ${dir}$tst FAIL: ppc-dabr-race syscall-reset: ./../tests/syscall-reset.c:95: main: Assertion `(*__errno_location ()) == 38' failed. errno 14 (Bad address) unexpected child status 67f FAIL: syscall-reset step-fork: ./../tests/step-fork.c:56: handler_fail: Assertion `0' failed. /bin/sh: line 5: 31144 Aborted ${dir}$tst FAIL: step-fork ibm-js22-02.rhts.bos.redhat.com ibm-js12-04.rhts.bos.redhat.com ibm-js12-05.rhts.bos.redhat.com Looks like failed only for syscall-reset and step-fork, as we have discussed before. I'll be reserving ibm-js20-02.lab.bos.redhat.com at the moment. Thanks, CAI Qian
Re: powerpc: step-jump-cont failure (Was: [PATCH] utrace: don't set -ops = utrace_detached_ops lockless)
On 12/07, caiq...@redhat.com wrote: Ananth, could you please confirm once again that step-jump-cont (from ptrace-tests testsuite) not fail on your machine? If yes, please tell me the version of glibc/gcc. Is PTRACE_GETREGS defined on your machine? Funny enough. The above failure only seen on that particular system so far. In fact, different PPC64 systems have different results there (roland's git tree + your lockless patch). Great! thanks. OK, I seem to understand what happens, but I can not explain WHY does this happen on that machine. Once again. The tracer changes the tracee's instruction pointer to the adrress of raise_sigusr2(), and resumes the tracee. The tracee gets SIGSEGV right after that. But. raise_sigusr2 is not equal to the actual address of raise_sigusr2(), this value points to the thunk (I do not know the correct English term) which contains the actual address: (gdb) disassemble 0x100118c0 Dump of assembler code for function raise_sigusr2: 0x100118c0 raise_sigusr2+0: .long 0x0 SIGSEGV 0x100118c4 raise_sigusr2+4: .long 0x1ab0 aof raise_sigusr2() 0x100118c8 raise_sigusr2+8: .long 0x0 And!!! this thunk does NOT live in .text, and vma does NOT have VM_EXEC bit! # cat /proc/30494/maps 0010-0012 r-xp 00:00 0 [vdso] 1000-1001 r-xp fd:00 59262 /root/TST/sjc 1001-1002 rw-p fd:00 59262 /root/TST/sjc That is why the tracee gets SIGSEGV, and this is correct. Cai, perhaps you could give me access to another ppc machine where this test does not fail? Or, could you please run the trivial program below on that machine? Oleg. #include stdio.h #include stdlib.h #include unistd.h void my_func(void) { } int main(void) { char cmd[128]; printf(ptr: %p\n, my_func); sprintf(cmd, cat /proc/%d/maps, getpid()); system(cmd); return 0; }
Re: powerpc: step-jump-cont failure (Was: [PATCH] utrace: don't set -ops = utrace_detached_ops lockless)
On Mon, 07 Dec 2009 15:24:51 +0100, Oleg Nesterov wrote: But. raise_sigusr2 is not equal to the actual address of raise_sigusr2(), this value points to the thunk (I do not know the correct English term) ppc64 calls it function descriptor (GDB ppc64_linux_convert_from_func_ptr_addr): For PPC64, a function descriptor is a TOC entry, in a data section, which contains three words: the first word is the address of the function, the second word is the TOC pointer (r2), and the third word is the static chain value. (gdb) x/8gx 0x805b6f6258 0x805b6f6258 open:0x00805b65cf68 0x00805b702ac0 0x805b6f6268 open64: 0x00805b65d010 0x00805b702ac0 (gdb) x/20i 0x00805b65cf68 0x805b65cf68 .__GI___open:lwz r10,-30432(r13) 0x805b65cf6c .__GI___open+4: cmpwi r10,0 0x805b65cf70 .__GI___open+8: bne-0x805b65cf84 .__GI___open+28 (gdb) info sym 0x00805b702ac0 last_nip in section .bss I was not aware there is any third word before and I do not see it there. Regards, Jan
Re: powerpc: step-jump-cont failure (Was: [PATCH] utrace: don't set -ops = utrace_detached_ops lockless)
On 12/07, Jan Kratochvil wrote: On Mon, 07 Dec 2009 15:24:51 +0100, Oleg Nesterov wrote: But. raise_sigusr2 is not equal to the actual address of raise_sigusr2(), this value points to the thunk (I do not know the correct English term) ppc64 calls it function descriptor (GDB ppc64_linux_convert_from_func_ptr_addr): For PPC64, a function descriptor is a TOC entry, Thanks Jan. in a data section, Yes! Now I can't understand how this test-case could ever work on ppc. step-jump-cont does: regs-nip = raise_sigusr2; --- points to data section ptrace(PTRACE_CONT); of course, the tracee gets SIGSEGV, this section is not executable. Oleg.
Re: powerpc: step-jump-cont failure (Was: [PATCH] utrace: don't set -ops = utrace_detached_ops lockless)
On 12/07, Oleg Nesterov wrote: On 12/07, Jan Kratochvil wrote: On Mon, 07 Dec 2009 15:24:51 +0100, Oleg Nesterov wrote: But. raise_sigusr2 is not equal to the actual address of raise_sigusr2(), this value points to the thunk (I do not know the correct English term) ppc64 calls it function descriptor (GDB ppc64_linux_convert_from_func_ptr_addr): For PPC64, a function descriptor is a TOC entry, Thanks Jan. in a data section, Yes! Now I can't understand how this test-case could ever work on ppc. step-jump-cont does: regs-nip = raise_sigusr2; --- points to data section ptrace(PTRACE_CONT); of course, the tracee gets SIGSEGV, this section is not executable. Hmm. Looks like, powerpc means a lot of different hardware, and _PAGE_EXEC may be 0. I didn't notice this when I quickly grepped arch/powerpc/ IOW, perhaps on some machines r implies x ? Is yes, this can explain why the results differ on different machines. Oleg.