Re: Tests Failures on PPC64
On Wed, 09 Dec 2009 19:31:52 +0100, Oleg Nesterov wrote: Hmm. it is obvioulsy racy, static volatile unsigned started is not atomic and thus the main thread can hang doing while (started THREADS); not that I think this explains the failure though. Thanks, fixed (but the problem is not reproducible for me). Regards, Jan --- ppc-dabr-race.c 8 Dec 2008 18:23:41 - 1.8 +++ ppc-dabr-race.c 14 Dec 2009 12:03:49 - 1.9 @@ -141,13 +141,14 @@ handler_fail (int signo) assert (0); } +/* STARTED requires atomic access. */ static volatile unsigned started; static void *child_thread (void *data) { pid_t tid = gettid (); - started++; + __sync_add_and_fetch (started, 1); /* We should stay in the syscall - better race probability. */ sleep (1); @@ -178,7 +179,7 @@ static void child_func (void) assert (i == 0); } - while (started THREADS); + while (__sync_add_and_fetch (started, 0) THREADS); l = ptrace (PTRACE_TRACEME, 0, NULL, NULL); assert (l == 0);
Re: Tests Failures on PPC64
Yes. I straced gdb to be sure it really does PTRACE_SET_DEBUGREF to use the hardware watchpoint. There is something strange though. gdb does PTRACE_SINGLESTEP and only then PTRACE_CONT after watch xxx. powerpc's data breakpoints are before-access, whereas x86's are after-access. In x86-speak, it's a fault-type exception rather than a trap-type. The only way to actually get the caught load or store to complete is to clear the DABR, single-step, and then restore it. Thanks, Roland
Re: Tests Failures on PPC64
On 12/11, K.Prasad wrote: On Thu, Dec 10, 2009 at 08:24:36PM +0100, Oleg Nesterov wrote: Oh well. I spent this day grepping arch/powerpc to understand how PTRACE_SET_DEBUGREG works and what is the problem. But I am afraid this time I need a help from someone who understands the hardware magic on powerpc. There's relatively less magic with PPC64 (with just one DABR) compared to x86 :-) I hope to offer a little help here (given that I work to tweak ptrace_set_debugreg() in PPC64 to use the hw-breakpoint interfaces) Thanks, please see another email, I cc'ed you. Watchpoints (using DABR) through GDB can fail for many reasonsthey must ideally be set after the program has started execution - to enable GDB know the size of the variable...else they would resort to single-stepping to trap access to the target variable. Yes. I straced gdb to be sure it really does PTRACE_SET_DEBUGREF to use the hardware watchpoint. There is something strange though. gdb does PTRACE_SINGLESTEP and only then PTRACE_CONT after watch xxx. Where can one find the relevant piece of testcase? http://sources.redhat.com/cgi-bin/cvsweb.cgi/~checkout~/tests/ptrace-tests/tests/watchpoint.c?cvsroot=systemtap Oleg.
Re: Tests Failures on PPC64
On Fri, Dec 11, 2009 at 04:59:44PM +0100, Oleg Nesterov wrote: On 12/11, K.Prasad wrote: Watchpoints (using DABR) through GDB can fail for many reasonsthey must ideally be set after the program has started execution - to enable GDB know the size of the variable...else they would resort to single-stepping to trap access to the target variable. Yes. I straced gdb to be sure it really does PTRACE_SET_DEBUGREF to use the hardware watchpoint. There is something strange though. gdb does PTRACE_SINGLESTEP and only then PTRACE_CONT after watch xxx. I haven't taken a good look at the testcase...although I suspect that the use of PTRACE_SINGLESTEP vs PTRACE_SET_DEBUGREG during your trials is due to the way watch var is being set. For instance, here are two screenlogs taken from a PPC64 (Power5 box running RHEL 5.3 2.6.18-128.el5). It can be seen that hw-breakpoints are used only during the second-run of GDB vs single-stepping done in the first...wondering if you used it in similar ways. First run --- [r...@p510 ~]# gdb prasad GNU gdb Fedora (6.8-27.el5) Copyright (C) 2008 Free Software Foundation, Inc. License GPLv3+: GNU GPL version 3 or later http://gnu.org/licenses/gpl.html This is free software: you are free to change and redistribute it. There is NO WARRANTY, to the extent permitted by law. Type show copying and show warranty for details. This GDB was configured as ppc64-redhat-linux-gnu... (gdb) watch b Watchpoint 1: b (Watchpoint vs Hardware watchpoint) (gdb) r Starting program: /root/prasad Prasad: sizeof(long): sizeof(b)=4 Watchpoint 1: b Old value = 0 New value = 200 main () at a.c:17 17 i = 300; (gdb) c Second run --- [r...@p510 ~]# gdb prasad GNU gdb Fedora (6.8-27.el5) Copyright (C) 2008 Free Software Foundation, Inc. License GPLv3+: GNU GPL version 3 or later http://gnu.org/licenses/gpl.html This is free software: you are free to change and redistribute it. There is NO WARRANTY, to the extent permitted by law. Type show copying and show warranty for details. This GDB was configured as ppc64-redhat-linux-gnu... (gdb) start Breakpoint 1 at 0x14bc: file a.c, line 14. Starting program: /root/prasad main () at a.c:14 14 printf(\nPrasad: sizeof(long): sizeof(b)=%d\n, sizeof(b)); (gdb) watch b Hardware watchpoint 2: b --(uses DABR unlike the firstrun) (gdb) q The program is running. Exit anyway? (y or n) y
Re: powerpc: PPC970FX dabr bug? (Was: Tests Failures on PPC64)
On 12/11, Oleg Nesterov wrote: For those who didn't read the whole thread, the test-case: http://sources.redhat.com/cgi-bin/cvsweb.cgi/~checkout~/tests/ptrace-tests/tests/watchpoint.c?cvsroot=systemtap or you can look at https://www.redhat.com/archives/utrace-devel/2009-December/msg00096.html it is very easy to reproduce the problem with gdb. On 12/10, Oleg Nesterov wrote: On 12/10, Oleg Nesterov wrote: On 12/09, CAI Qian wrote: - Oleg Nesterov o...@redhat.com wrote: Thanks, but it doesn't fail for me on this machine... Hmm, it failed for me. # cd /root/ptrace-tests # make check ... FAIL: watchpoint OMG. Yet another test-case fails on powerpc I didn't see this failure in the previous reports or missed it ... I bet it fails without utrace too? (please don't tell it doesn't ;) Did you see it fails on other ppc64 machines? Oh well. I spent this day grepping arch/powerpc to understand how PTRACE_SET_DEBUGREG works and what is the problem. But I am afraid this time I need a help from someone who understands the hardware magic on powerpc. So far: - the test-case looks correct to me OOPS. I am not sure, will re-check tomorrow. But it seems to me gcc optimizes out check = 1, despite the fact it is declared as volatile. No, I misread the asm (which I don't understand anyway). The tracee does write to check, and this is even seen by PTRACE_PEEKDATA. Looks like a hardware problem to me. For example, this patch --- watchpoint.c~ 2009-12-11 15:32:14.0 +0100 +++ watchpoint.c2009-12-11 15:36:17.0 +0100 @@ -144,7 +144,7 @@ handler_fail (int signo) raise (signo); } -static volatile long long check; +volatile long long check; int main (void) fixes the problem. This one --- watchpoint.c~ 2009-12-11 15:32:14.0 +0100 +++ watchpoint.c2009-12-11 15:38:10.0 +0100 @@ -169,7 +169,7 @@ main (void) i = raise (SIGUSR1); assert (i == 0); - check = 1; + check = 0xfff; i = raise (SIGUSR2); assert (i == 0); helps too (any value which can't be immediate for powerpc works, unless I misinterpret asm again). I give up, this needs a help from powerpc experts. As a last resort I tried google, # grep cpu /proc/cpuinfo cpu : PPC970FX, altivec supported cpu : PPC970FX, altivec supported http://www.google.com/linux?q=powerpc+970FX+dabr+bug from http://lists.ozlabs.org/pipermail/linuxppc-dev/2008-March/052910.html Which is IBM PowerPC 970FX RISC Microprocessor Errata List for DD3.X and contains Erratum #8: DABRX register might not always be updated correctly: Projected Impact The data address breakpoint function might not always work. Workaround None. Status A fix is not planned at this time for the PowerPC 970FX. but this machine sets set_dabr = pseries_set_dabr(), not pseries_set_xdabr(), not sure this is relevant. Gurus, please help! Oleg.
Re: Tests Failures on PPC64
On 12/10, Oleg Nesterov wrote: On 12/09, CAI Qian wrote: - Oleg Nesterov o...@redhat.com wrote: Thanks, but it doesn't fail for me on this machine... Hmm, it failed for me. # cd /root/ptrace-tests # make check ... FAIL: watchpoint OMG. Yet another test-case fails on powerpc I didn't see this failure in the previous reports or missed it ... I bet it fails without utrace too? (please don't tell it doesn't ;) Did you see it fails on other ppc64 machines? Oh well. I spent this day grepping arch/powerpc to understand how PTRACE_SET_DEBUGREG works and what is the problem. But I am afraid this time I need a help from someone who understands the hardware magic on powerpc. So far: - the test-case looks correct to me OOPS. I am not sure, will re-check tomorrow. But it seems to me gcc optimizes out check = 1, despite the fact it is declared as volatile. Oleg.
Re: Tests Failures on PPC64
I bet it fails without utrace too? (please don't tell it doesn't ;) Yes. Did you see it fails on other ppc64 machines? No. Ah. I didn't notice you did biarch-check, not check. Will take a look later... Thanks in advance, CAI Qian
Re: Tests Failures on PPC64
On Thu, Dec 10, 2009 at 08:24:36PM +0100, Oleg Nesterov wrote: On 12/09, CAI Qian wrote: - Oleg Nesterov o...@redhat.com wrote: Thanks, but it doesn't fail for me on this machine... Hmm, it failed for me. # cd /root/ptrace-tests # make check ... FAIL: watchpoint OMG. Yet another test-case fails on powerpc I didn't see this failure in the previous reports or missed it ... I bet it fails without utrace too? (please don't tell it doesn't ;) Did you see it fails on other ppc64 machines? Oh well. I spent this day grepping arch/powerpc to understand how PTRACE_SET_DEBUGREG works and what is the problem. But I am afraid this time I need a help from someone who understands the hardware magic on powerpc. There's relatively less magic with PPC64 (with just one DABR) compared to x86 :-) I hope to offer a little help here (given that I work to tweak ptrace_set_debugreg() in PPC64 to use the hw-breakpoint interfaces) Watchpoints (using DABR) through GDB can fail for many reasonsthey must ideally be set after the program has started execution - to enable GDB know the size of the variable...else they would resort to single-stepping to trap access to the target variable. Cai, Where can one find the relevant piece of testcase? Thanks, K.Prasad
Re: Tests Failures on PPC64
On 12/08, caiq...@redhat.com wrote: This is seen with and without CONFIG_UTRACE. Good, at least we shouldn't worry about utrace. FAIL: watchpoint ppc-dabr-race: ./../tests/ppc-dabr-race.c:141: handler_fail: Assertion `0' failed. /bin/sh: line 5: 31750 Aborted ${dir}$tst FAIL: ppc-dabr-race Are those known issues? No, it is not. However I do not not what this test-case does, and I know nothing about data watchpoints. Hmm. it is obvioulsy racy, static volatile unsigned started is not atomic and thus the main thread can hang doing while (started THREADS); not that I think this explains the failure though. Cai, I tried to reproduce the failure on your machine but it doesn't fail? Oleg.
Tests Failures on PPC64
This is seen with and without CONFIG_UTRACE. FAIL: watchpoint ppc-dabr-race: ./../tests/ppc-dabr-race.c:141: handler_fail: Assertion `0' failed. /bin/sh: line 5: 31750 Aborted ${dir}$tst FAIL: ppc-dabr-race Are those known issues? Thanks, CAI Qian