Re: Tests Failures on PPC64

2009-12-14 Thread Jan Kratochvil
On Wed, 09 Dec 2009 19:31:52 +0100, Oleg Nesterov wrote:
 Hmm. it is obvioulsy racy, static volatile unsigned started
 is not atomic and thus the main thread can hang doing
 
   while (started  THREADS);
 
 not that I think this explains the failure though.

Thanks, fixed (but the problem is not reproducible for me).


Regards,
Jan


--- ppc-dabr-race.c 8 Dec 2008 18:23:41 -   1.8
+++ ppc-dabr-race.c 14 Dec 2009 12:03:49 -  1.9
@@ -141,13 +141,14 @@ handler_fail (int signo)
   assert (0);
 }
 
+/* STARTED requires atomic access.  */
 static volatile unsigned started;
 
 static void *child_thread (void *data)
 {
   pid_t tid = gettid ();
 
-  started++;
+  __sync_add_and_fetch (started, 1);
 
   /* We should stay in the syscall - better race probability.  */
   sleep (1);
@@ -178,7 +179,7 @@ static void child_func (void)
   assert (i == 0);
 }
 
-  while (started  THREADS);
+  while (__sync_add_and_fetch (started, 0)  THREADS);
 
   l = ptrace (PTRACE_TRACEME, 0, NULL, NULL);
   assert (l == 0);



Re: Tests Failures on PPC64

2009-12-13 Thread Roland McGrath
 Yes. I straced gdb to be sure it really does PTRACE_SET_DEBUGREF to
 use the hardware watchpoint.
 
 There is something strange though. gdb does PTRACE_SINGLESTEP and only
 then PTRACE_CONT after watch xxx.

powerpc's data breakpoints are before-access, whereas x86's are
after-access.  In x86-speak, it's a fault-type exception rather than a
trap-type.  The only way to actually get the caught load or store to
complete is to clear the DABR, single-step, and then restore it.


Thanks,
Roland



Re: Tests Failures on PPC64

2009-12-11 Thread Oleg Nesterov
On 12/11, K.Prasad wrote:

 On Thu, Dec 10, 2009 at 08:24:36PM +0100, Oleg Nesterov wrote:
 
  Oh well. I spent this day grepping arch/powerpc to understand how
  PTRACE_SET_DEBUGREG works and what is the problem. But I am afraid
  this time I need a help from someone who understands the hardware
  magic on powerpc.
 

 There's relatively less magic with PPC64 (with just one DABR) compared
 to x86 :-)

 I hope to offer a little help here (given that I work to tweak
 ptrace_set_debugreg() in PPC64 to use the hw-breakpoint interfaces)

Thanks, please see another email, I cc'ed you.

 Watchpoints (using DABR) through GDB can fail for many reasonsthey
 must ideally be set after the program has started execution - to enable
 GDB know the size of the variable...else they would resort to
 single-stepping to trap access to the target variable.

Yes. I straced gdb to be sure it really does PTRACE_SET_DEBUGREF to
use the hardware watchpoint.

There is something strange though. gdb does PTRACE_SINGLESTEP and only
then PTRACE_CONT after watch xxx.

Where can one find the relevant piece of testcase?

http://sources.redhat.com/cgi-bin/cvsweb.cgi/~checkout~/tests/ptrace-tests/tests/watchpoint.c?cvsroot=systemtap

Oleg.



Re: Tests Failures on PPC64

2009-12-11 Thread K.Prasad
On Fri, Dec 11, 2009 at 04:59:44PM +0100, Oleg Nesterov wrote:
 On 12/11, K.Prasad wrote:
  Watchpoints (using DABR) through GDB can fail for many reasonsthey
  must ideally be set after the program has started execution - to enable
  GDB know the size of the variable...else they would resort to
  single-stepping to trap access to the target variable.
 
 Yes. I straced gdb to be sure it really does PTRACE_SET_DEBUGREF to
 use the hardware watchpoint.
 
 There is something strange though. gdb does PTRACE_SINGLESTEP and only
 then PTRACE_CONT after watch xxx.


I haven't taken a good look at the testcase...although I suspect that
the use of PTRACE_SINGLESTEP vs PTRACE_SET_DEBUGREG during your trials
is due to the way watch var is being set.

For instance, here are two screenlogs taken from a PPC64 (Power5 box
running RHEL 5.3 2.6.18-128.el5). It can be seen that hw-breakpoints are
used only during the second-run of GDB vs single-stepping done in the
first...wondering if you used it in similar ways.

First run
---
[r...@p510 ~]# gdb prasad
GNU gdb Fedora (6.8-27.el5)
Copyright (C) 2008 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later
http://gnu.org/licenses/gpl.html
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type show
copying
and show warranty for details.
This GDB was configured as ppc64-redhat-linux-gnu...
(gdb) watch b
Watchpoint 1: b (Watchpoint vs Hardware watchpoint)
(gdb) r
Starting program: /root/prasad 

Prasad: sizeof(long): sizeof(b)=4
Watchpoint 1: b

Old value = 0
New value = 200
main () at a.c:17
17  i = 300;
(gdb) c


Second run
---
[r...@p510 ~]# gdb prasad
GNU gdb Fedora (6.8-27.el5)
Copyright (C) 2008 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later
http://gnu.org/licenses/gpl.html
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type show
copying
and show warranty for details.
This GDB was configured as ppc64-redhat-linux-gnu...
(gdb) start
Breakpoint 1 at 0x14bc: file a.c, line 14.
Starting program: /root/prasad 
main () at a.c:14
14  printf(\nPrasad: sizeof(long): sizeof(b)=%d\n,
sizeof(b));
(gdb) watch b
Hardware watchpoint 2: b --(uses DABR unlike the firstrun)
(gdb) q
The program is running.  Exit anyway? (y or n) y



Re: powerpc: PPC970FX dabr bug? (Was: Tests Failures on PPC64)

2009-12-11 Thread Oleg Nesterov
On 12/11, Oleg Nesterov wrote:

 For those who didn't read the whole thread, the test-case:
 http://sources.redhat.com/cgi-bin/cvsweb.cgi/~checkout~/tests/ptrace-tests/tests/watchpoint.c?cvsroot=systemtap

or you can look at

https://www.redhat.com/archives/utrace-devel/2009-December/msg00096.html

it is very easy to reproduce the problem with gdb.

 On 12/10, Oleg Nesterov wrote:
 
  On 12/10, Oleg Nesterov wrote:
  
   On 12/09, CAI Qian wrote:
   
- Oleg Nesterov o...@redhat.com wrote:
   
 Thanks, but it doesn't fail for me on this machine...
   
Hmm, it failed for me.
   
# cd /root/ptrace-tests
   
# make check
...
FAIL: watchpoint
  
   OMG. Yet another test-case fails on powerpc  I didn't see this
   failure in the previous reports or missed it ...
  
   I bet it fails without utrace too? (please don't tell it doesn't ;)
  
   Did you see it fails on other ppc64 machines?
  
  
   Oh well. I spent this day grepping arch/powerpc to understand how
   PTRACE_SET_DEBUGREG works and what is the problem. But I am afraid
   this time I need a help from someone who understands the hardware
   magic on powerpc.
  
   So far:
  
 - the test-case looks correct to me
 
  OOPS.
 
  I am not sure, will re-check tomorrow. But it seems to me gcc
  optimizes out check = 1, despite the fact it is declared as
  volatile.
 
 No, I misread the asm (which I don't understand anyway). The tracee
 does write to check, and this is even seen by PTRACE_PEEKDATA.
 
 Looks like a hardware problem to me. For example, this patch
 
   --- watchpoint.c~   2009-12-11 15:32:14.0 +0100
   +++ watchpoint.c2009-12-11 15:36:17.0 +0100
   @@ -144,7 +144,7 @@ handler_fail (int signo)
  raise (signo);
}

   -static volatile long long check;
   +volatile long long check;

int
main (void)
 
 fixes the problem. This one
 
   --- watchpoint.c~   2009-12-11 15:32:14.0 +0100
   +++ watchpoint.c2009-12-11 15:38:10.0 +0100
   @@ -169,7 +169,7 @@ main (void)
   i = raise (SIGUSR1);
   assert (i == 0);

   -   check = 1;
   +   check = 0xfff;

   i = raise (SIGUSR2);
   assert (i == 0);
 
 helps too (any value which can't be immediate for powerpc works,
 unless I misinterpret asm again).
 
 
 I give up, this needs a help from powerpc experts. As a last resort
 I tried google,
 
   # grep cpu /proc/cpuinfo
   cpu : PPC970FX, altivec supported
   cpu : PPC970FX, altivec supported
 
 
 http://www.google.com/linux?q=powerpc+970FX+dabr+bug
 
 from http://lists.ozlabs.org/pipermail/linuxppc-dev/2008-March/052910.html
 
   Which is IBM PowerPC 970FX RISC Microprocessor Errata List for DD3.X
   and contains Erratum #8: DABRX register might not always be updated 
 correctly:
 
   Projected Impact
 The data address breakpoint function might not always 
 work.
   Workaround
 None.
   Status
 A fix is not planned at this time for the PowerPC 970FX.
 
 but this machine sets set_dabr = pseries_set_dabr(), not pseries_set_xdabr(),
 not sure this is relevant.
 
 Gurus, please help!
 
 Oleg.



Re: Tests Failures on PPC64

2009-12-10 Thread Oleg Nesterov
On 12/10, Oleg Nesterov wrote:

 On 12/09, CAI Qian wrote:
 
  - Oleg Nesterov o...@redhat.com wrote:
 
   Thanks, but it doesn't fail for me on this machine...
 
  Hmm, it failed for me.
 
  # cd /root/ptrace-tests
 
  # make check
  ...
  FAIL: watchpoint

 OMG. Yet another test-case fails on powerpc  I didn't see this
 failure in the previous reports or missed it ...

 I bet it fails without utrace too? (please don't tell it doesn't ;)

 Did you see it fails on other ppc64 machines?


 Oh well. I spent this day grepping arch/powerpc to understand how
 PTRACE_SET_DEBUGREG works and what is the problem. But I am afraid
 this time I need a help from someone who understands the hardware
 magic on powerpc.

 So far:

   - the test-case looks correct to me

OOPS.

I am not sure, will re-check tomorrow. But it seems to me gcc
optimizes out check = 1, despite the fact it is declared as
volatile.

Oleg.



Re: Tests Failures on PPC64

2009-12-10 Thread CAI Qian

 I bet it fails without utrace too? (please don't tell it doesn't ;)

Yes.

 Did you see it fails on other ppc64 machines?

No.

 Ah. I didn't notice you did biarch-check, not check.
 Will take a look later...

Thanks in advance,
CAI Qian



Re: Tests Failures on PPC64

2009-12-10 Thread K.Prasad
On Thu, Dec 10, 2009 at 08:24:36PM +0100, Oleg Nesterov wrote:
 On 12/09, CAI Qian wrote:
 
  - Oleg Nesterov o...@redhat.com wrote:
 
   Thanks, but it doesn't fail for me on this machine...
 
  Hmm, it failed for me.
 
  # cd /root/ptrace-tests
 
  # make check
  ...
  FAIL: watchpoint
 
 OMG. Yet another test-case fails on powerpc  I didn't see this
 failure in the previous reports or missed it ...
 
 I bet it fails without utrace too? (please don't tell it doesn't ;)
 
 Did you see it fails on other ppc64 machines?
 
 
 Oh well. I spent this day grepping arch/powerpc to understand how
 PTRACE_SET_DEBUGREG works and what is the problem. But I am afraid
 this time I need a help from someone who understands the hardware
 magic on powerpc.


There's relatively less magic with PPC64 (with just one DABR) compared
to x86 :-)

I hope to offer a little help here (given that I work to tweak
ptrace_set_debugreg() in PPC64 to use the hw-breakpoint interfaces)

Watchpoints (using DABR) through GDB can fail for many reasonsthey
must ideally be set after the program has started execution - to enable
GDB know the size of the variable...else they would resort to
single-stepping to trap access to the target variable.

Cai,
   Where can one find the relevant piece of testcase?

Thanks,
K.Prasad



Re: Tests Failures on PPC64

2009-12-09 Thread Oleg Nesterov
On 12/08, caiq...@redhat.com wrote:

 This is seen with and without CONFIG_UTRACE.

Good, at least we shouldn't worry about utrace.

 FAIL: watchpoint

 ppc-dabr-race: ./../tests/ppc-dabr-race.c:141: handler_fail: Assertion `0' 
 failed.
 /bin/sh: line 5: 31750 Aborted   ${dir}$tst
 FAIL: ppc-dabr-race

 Are those known issues?

No, it is not. However I do not not what this test-case does,
and I know nothing about data watchpoints.

Hmm. it is obvioulsy racy, static volatile unsigned started
is not atomic and thus the main thread can hang doing

while (started  THREADS);

not that I think this explains the failure though.


Cai, I tried to reproduce the failure on your machine but it
doesn't fail?

Oleg.



Tests Failures on PPC64

2009-12-08 Thread caiqian
This is seen with and without CONFIG_UTRACE.

FAIL: watchpoint

ppc-dabr-race: ./../tests/ppc-dabr-race.c:141: handler_fail: Assertion `0' 
failed.
/bin/sh: line 5: 31750 Aborted   ${dir}$tst
FAIL: ppc-dabr-race

Are those known issues?

Thanks,
CAI Qian