Re: powerpc: fork stepping (Was: [RFC, PATCH 0/14] utrace/ptrace)

2009-11-27 Thread Oleg Nesterov
On 11/27, Ananth N Mavinakayanahalli wrote:

 On Thu, Nov 26, 2009 at 03:50:51PM +0100, Oleg Nesterov wrote:

  Ananth, could you please run the test-case from the changelog
  below ? I do not really expect this can help, but just in case.

 Right, it doesn't help :-(

 GDB shows that the parent is forever struck at wait().

Now this is interesting. Could you please double check the parent hangs
in wait() ?

This doesn't match the testing we did on powerpc machine with Veaceslav,
and I hoped the problem was already resolved?

Please see other emails in this thread.


Hmm. Fortunately I still have the access to the testing machine.
Yes, according to gdb it looks as if it hangs in wait(). This
is not true. You can strace gdb itself, or look at xxx_ctxt_switches
in /proc/pid_of_parent/status.

Better yet, do not use gdb at all. Just strace (without -f) the parent,
you should see it continues to trace the child and loops forever.

Oleg.



Re: powerpc: fork stepping (Was: [RFC, PATCH 0/14] utrace/ptrace)

2009-11-27 Thread Ananth N Mavinakayanahalli
On Fri, Nov 27, 2009 at 06:46:27PM +0100, Veaceslav Falico wrote:
 On Thu, Nov 26, 2009 at 11:37:03PM +0100, Oleg Nesterov wrote:
 
  Could you look at this
 
ptrace-copy_process-should-disable-stepping.patch
http://marc.info/?l=linux-mm-commitsm=125789789322573
 
  patch? It is not clear to me how we can modify the test-case to
  verify it fixes the original problem for powerpc.
 
 I modified the test-case, it confirms that
 ptrace-copy_process-should-disable-stepping.patch fixes the
 problem with TIF_SINGLESTEP copied by fork() on powerpc.
 
 Probably we need a similar fix for step-fork.c in ptrace-tests.
 
 Modified the original testcase to call fork via syscall(__NR_fork),
 to avoid the looping inside libc's fork() on powerpc.
 The parent singlesteps until he sees that the child has forked, after
 that the parent PTRACE_CONTs until the child exits.

Thanks Veaceslav. This works:

Index: ptrace-tests/tests/step-fork.c
===
--- ptrace-tests.orig/tests/step-fork.c
+++ ptrace-tests/tests/step-fork.c
@@ -29,6 +29,7 @@
 #include unistd.h
 #include sys/wait.h
 #include string.h
+#include sys/syscall.h
 #include signal.h

 #ifndef PTRACE_SINGLESTEP
@@ -78,7 +79,7 @@ main (int argc, char **argv)
sigprocmask (SIG_BLOCK, mask, NULL);
ptrace (PTRACE_TRACEME);
raise (SIGUSR1);
-   if (fork () == 0)
+   if (syscall(__NR_fork) == 0)
  {
read (-1, NULL, 0);
_exit (22);

Oleg,
With the above patch applied, syscall-reset is the only failure I see on
powerpc:

errno 14 (Bad address)
syscall-reset: syscall-reset.c:95: main: Assertion `(*__errno_location
()) == 38' failed.
unexpected child status 67f
FAIL: syscall-reset
...

1 of 40 tests failed
(11 tests were not run)
Please report to utrace-devel@redhat.com


Ananth



Re: powerpc: fork stepping (Was: [RFC, PATCH 0/14] utrace/ptrace)

2009-11-26 Thread Veaceslav Falico
On Thu, Nov 26, 2009 at 06:25:24PM +0100, Oleg Nesterov wrote:
 On 11/26, Oleg Nesterov wrote:
 
  On 11/26, Ananth N Mavinakayanahalli wrote:
  
   step-fork: step-fork.c:56: handler_fail: Assertion `0' failed.
   /bin/sh: line 5: 17325 Aborted ${dir}$tst
   FAIL: step-fork
 
  Good to know, thanks again Ananth.
 
  I'll take a look. Since I know nothing about powerpc, I can't
  promise the quick fix ;)
 
  The bug was found by code inspection, but the fix is not trivial
  because it depends on arch/, and it turns out the arch-independent
  fix in
 
  ptrace-copy_process-should-disable-stepping.patch
  http://marc.info/?l=linux-mm-commitsm=125789789322573
 
  doesn't work.
 
 Just noticed the test-case fails in handler_fail(). Most probably
 this means it is killed by SIGALRM because either parent or child
 hang in wait(). Perhaps we have another (ppc specific?) bug, but
 currently I do not understand how this is possible, this should
 not be arch-dependent.

I can confirm that we have another bug on ppc arch. The test case below
is spinning forever, 

#include stdio.h
#include unistd.h
#include signal.h
#include sys/ptrace.h
#include sys/wait.h
#include assert.h

int main(void)
{
int pid, status;

if (!(pid = fork())) {
assert(ptrace(PTRACE_TRACEME) == 0);
kill(getpid(), SIGSTOP);

if (!fork())
return 0;

printf(fork passed..\n);

return 0;
}

for (;;) {
assert(pid == wait(status));
if (WIFEXITED(status))
break;
assert(ptrace(PTRACE_SINGLESTEP, pid, 0,0) == 0);
}

printf(Parent exit.\n);

return 0;
}

it doesn't hang, the parent is spinning around for, the test case
isn't printing anything. Seems like fork() can't complete under
PTRACE_SINGLESTEP.

--
Veaceslav 



Re: powerpc: fork stepping (Was: [RFC, PATCH 0/14] utrace/ptrace)

2009-11-26 Thread Oleg Nesterov
Veaceslav doesn't have the time to continue, but he gave me
access to rhts machine ;)

The kernel is 2.6.31.6 btw.

On 11/26, Veaceslav Falico wrote:

  Just noticed the test-case fails in handler_fail(). Most probably
  this means it is killed by SIGALRM because either parent or child
  hang in wait(). Perhaps we have another (ppc specific?) bug, but
  currently I do not understand how this is possible, this should
  not be arch-dependent.

 I can confirm that we have another bug on ppc arch. The test case below
 is spinning forever,

 [...]

 it doesn't hang, the parent is spinning around for, the test case
 isn't printing anything. Seems like fork() can't complete under
 PTRACE_SINGLESTEP.

Yep, thanks a lot Veaceslav.

I modified this test-case to print si_addr:

int main(void)
{
int pid, status;

if (!(pid = fork())) {
assert(ptrace(PTRACE_TRACEME) == 0);
kill(getpid(), SIGSTOP);

if (!fork())
return 0;

printf(fork passed..\n);

return 0;
}

for (;;) {
siginfo_t info;

assert(pid == wait(status));
assert(status = 0x57f);

assert(ptrace(PTRACE_GETSIGINFO, pid, 0,info) == 0);
printf(%p\n, info.si_addr);

if (WIFEXITED(status))
break;
assert(ptrace(PTRACE_SINGLESTEP, pid, 0,0) == 0);
}

printf(Parent exit.\n);

return 0;
}

the output is:

...
0xfedf880
0xfedf884
...
0xfedf96c
0xfedf970

this is fork which calls __GI__IO_list_lock

Dump of assembler code for function fork:
0x0fedf880 fork+0:mflrr0
...
0x0fedf96c fork+236:  li  r28,0
0x0fedf970 fork+240:  bl  0xfeacce0 __GI__IO_list_lock

Then it loops inside __GI__IO_list_lock

...
0xfeacd24
0xfeacd28
0xfeacd2c
0xfeacd30
0xfeacd34

0xfeacd24
0xfeacd28
0xfeacd2c
0xfeacd30
0xfeacd34

0xfeacd24
0xfeacd28
0xfeacd2c
0xfeacd30
0xfeacd34
...

and so on forever,

Dump of assembler code for function __GI__IO_list_lock:
0x0feacce0 __GI__IO_list_lock+0:  mflrr0
0x0feacce4 __GI__IO_list_lock+4:  stwur1,-32(r1)
0x0feacce8 __GI__IO_list_lock+8:  li  r11,0
0x0feaccec __GI__IO_list_lock+12: bcl-20,4*cr7+so,0xfeaccf0 
__GI__IO_list_lock+16
0x0feaccf0 __GI__IO_list_lock+16: li  r9,1
0x0feaccf4 __GI__IO_list_lock+20: stw r0,36(r1)
0x0feaccf8 __GI__IO_list_lock+24: stw r30,24(r1)
0x0feaccfc __GI__IO_list_lock+28: mflrr30
0x0feacd00 __GI__IO_list_lock+32: stw r31,28(r1)
0x0feacd04 __GI__IO_list_lock+36: stw r29,20(r1)
0x0feacd08 __GI__IO_list_lock+40: addir29,r2,-29824
0x0feacd0c __GI__IO_list_lock+44: addis   r30,r30,16
0x0feacd10 __GI__IO_list_lock+48: addir30,r30,13060
0x0feacd14 __GI__IO_list_lock+52: lwz r31,-6436(r30)
0x0feacd18 __GI__IO_list_lock+56: lwz r0,8(r31)
0x0feacd1c __GI__IO_list_lock+60: cmpwcr7,r0,r29
0x0feacd20 __GI__IO_list_lock+64: beq-cr7,0xfeacd4c 
__GI__IO_list_lock+108

beg-   0x0feacd24 __GI__IO_list_lock+68: lwarx   r0,0,r31
0x0feacd28 __GI__IO_list_lock+72: cmpwr0,r11
0x0feacd2c __GI__IO_list_lock+76: bne-0xfeacd38 
__GI__IO_list_lock+88
0x0feacd30 __GI__IO_list_lock+80: stwcx.  r9,0,r31
end-   0x0feacd34 __GI__IO_list_lock+84: bne+0xfeacd24 
__GI__IO_list_lock+68

I don't even know whether this is user-space bug or kernel bug,
the asm above is the black magic for me.

Anyone who knows something about powerpc can give me a hint?

Oleg.



Re: powerpc: fork stepping (Was: [RFC, PATCH 0/14] utrace/ptrace)

2009-11-26 Thread Oleg Nesterov
On 11/26, Oleg Nesterov wrote:

 Then it loops inside __GI__IO_list_lock

   0xfeacd24
   0xfeacd28
   0xfeacd2c
   0xfeacd30
   0xfeacd34
   ...

 and so on forever,

   Dump of assembler code for function __GI__IO_list_lock:
   0x0feacce0 __GI__IO_list_lock+0:  mflrr0
   0x0feacce4 __GI__IO_list_lock+4:  stwur1,-32(r1)
   0x0feacce8 __GI__IO_list_lock+8:  li  r11,0
   0x0feaccec __GI__IO_list_lock+12: bcl-20,4*cr7+so,0xfeaccf0 
 __GI__IO_list_lock+16
   0x0feaccf0 __GI__IO_list_lock+16: li  r9,1
   0x0feaccf4 __GI__IO_list_lock+20: stw r0,36(r1)
   0x0feaccf8 __GI__IO_list_lock+24: stw r30,24(r1)
   0x0feaccfc __GI__IO_list_lock+28: mflrr30
   0x0feacd00 __GI__IO_list_lock+32: stw r31,28(r1)
   0x0feacd04 __GI__IO_list_lock+36: stw r29,20(r1)
   0x0feacd08 __GI__IO_list_lock+40: addir29,r2,-29824
   0x0feacd0c __GI__IO_list_lock+44: addis   r30,r30,16
   0x0feacd10 __GI__IO_list_lock+48: addir30,r30,13060
   0x0feacd14 __GI__IO_list_lock+52: lwz r31,-6436(r30)
   0x0feacd18 __GI__IO_list_lock+56: lwz r0,8(r31)
   0x0feacd1c __GI__IO_list_lock+60: cmpwcr7,r0,r29
   0x0feacd20 __GI__IO_list_lock+64: beq-cr7,0xfeacd4c 
 __GI__IO_list_lock+108

 beg- 0x0feacd24 __GI__IO_list_lock+68: lwarx   r0,0,r31
   0x0feacd28 __GI__IO_list_lock+72: cmpwr0,r11
   0x0feacd2c __GI__IO_list_lock+76: bne-0xfeacd38 
 __GI__IO_list_lock+88
   0x0feacd30 __GI__IO_list_lock+80: stwcx.  r9,0,r31
 end- 0x0feacd34 __GI__IO_list_lock+84: bne+0xfeacd24 
 __GI__IO_list_lock+68

 I don't even know whether this is user-space bug or kernel bug,
 the asm above is the black magic for me.

When I use gdb to step over __GI__IO_list_lock(), it doesn't loop.
I straced gdb and noticed that when the trace reaches

0x0feacd24: lwarx   r0,0,r31

gdb does PTRACE_CONT, not PTRACE_SINGLESTEP. After that the child
stops at 0x0feacd38, the next insn (isync).

 Anyone who knows something about powerpc can give me a hint?

Please ;)

Oleg.



Re: powerpc: fork stepping (Was: [RFC, PATCH 0/14] utrace/ptrace)

2009-11-26 Thread Paul Mackerras
Oleg Nesterov writes:

   0xfeacd24
   0xfeacd28
   0xfeacd2c
   0xfeacd30
   0xfeacd34
   ...
 
 and so on forever,
...
 beg- 0x0feacd24 __GI__IO_list_lock+68: lwarx   r0,0,r31
   0x0feacd28 __GI__IO_list_lock+72: cmpwr0,r11
   0x0feacd2c __GI__IO_list_lock+76: bne-0xfeacd38 
 __GI__IO_list_lock+88
   0x0feacd30 __GI__IO_list_lock+80: stwcx.  r9,0,r31
 end- 0x0feacd34 __GI__IO_list_lock+84: bne+0xfeacd24 
 __GI__IO_list_lock+68
 
 I don't even know whether this is user-space bug or kernel bug,
 the asm above is the black magic for me.

The lwarx and stwcx. work together to do an atomic update to the word
whose address is in r31.  They are like LL (load-linked) and SC
(store-conditional) on other architectures such as alpha.  Basically
the lwarx creates an internal reservation on the word pointed to by
r31 and loads its value into r0.  The stwcx. stores into that word but
only if the reservation still exists.  The reservation gets cleared
(in hardware) if any other cpu writes to that word in the meantime.
If the reservation did get cleared, the bne (branch if not equal)
instruction will be taken and we loop around to try again.

There is a difficulty when single-stepping through such a sequence
because the process of taking the single-step exception and returning
will clear the reservation.  Thus if you single-step through that
sequence it will never succeed.  I believe gdb has code to recognize
this kind of sequence and run through it without stopping until after
the bne, precisely to avoid this problem.

Paul.



Re: powerpc: fork stepping (Was: [RFC, PATCH 0/14] utrace/ptrace)

2009-11-26 Thread Andreas Schwab
Paul Mackerras pau...@samba.org writes:

 I believe gdb has code to recognize this kind of sequence and run
 through it without stopping until after the bne, precisely to avoid
 this problem.

See gdb/rs6000-tdep.c:ppc_deal_with_atomic_sequence.

Andreas.

-- 
Andreas Schwab, sch...@linux-m68k.org
GPG Key fingerprint = 58CA 54C7 6D53 942B 1756  01D3 44D5 214B 8276 4ED5
And now for something completely different.



Re: powerpc: fork stepping (Was: [RFC, PATCH 0/14] utrace/ptrace)

2009-11-26 Thread Ananth N Mavinakayanahalli
On Thu, Nov 26, 2009 at 03:50:51PM +0100, Oleg Nesterov wrote:
 I changed the subject. This bug has nothing to do with utrace,
 the kernel fails with or without these changes.
 
 On 11/26, Ananth N Mavinakayanahalli wrote:
 
  On Wed, Nov 25, 2009 at 04:40:52PM +0100, Oleg Nesterov wrote:
   On 11/25, Ananth N Mavinakayanahalli wrote:
   
step-fork: step-fork.c:56: handler_fail: Assertion `0' failed.
/bin/sh: line 5: 24803 Aborted ${dir}$tst
FAIL: step-fork
  
   This is expected. Should be fixed by
  
 ptrace-copy_process-should-disable-stepping.patch
  
   in -mm tree. (I am attaching this patch below just in case)
   I din't mention this patch in this series because this bug
   is ortogonal to utrace/ptrace.
 
  The patch doesn't seem to fix the issue on powerpc:
 
  step-fork: step-fork.c:56: handler_fail: Assertion `0' failed.
  /bin/sh: line 5: 17325 Aborted ${dir}$tst
  FAIL: step-fork
 
 Good to know, thanks again Ananth.
 
 I'll take a look. Since I know nothing about powerpc, I can't
 promise the quick fix ;)
 
 The bug was found by code inspection, but the fix is not trivial
 because it depends on arch/, and it turns out the arch-independent
 fix in
 
   ptrace-copy_process-should-disable-stepping.patch
   http://marc.info/?l=linux-mm-commitsm=125789789322573
 
 doesn't work.
 
 Ananth, could you please run the test-case from the changelog
 below ? I do not really expect this can help, but just in case.

Right, it doesn't help :-(

GDB shows that the parent is forever struck at wait().

Ananth