Re: gdbstub initial code, v11

2010-09-22 Thread Jan Kratochvil
On Wed, 22 Sep 2010 21:09:12 +0200, Tom Tromey wrote:
 I think it would be good to implement a feature that shows how this
 approach is an improvement over the current state of gdb+ptrace or
 gdb+gdbserver.
 
 Exactly what feature this should be... I don't know :-)
 I would imagine something performance-related.

I would bet on a massive threads creating/deleting testcase signalling tasks
around, together with watchpoints.  There are races in the linux-nat code and
IIRC even gdbserver code.

OTOH if one tries hard one can probably manage one day to fix all the corner
cases in the ptrace based linux-nat and gdbserver.


Regards,
Jan



Re: [PATCH] utrace: utrace_reset() should clear TIF_SINGLESTEP if no more engines

2010-09-20 Thread Jan Kratochvil
On Mon, 20 Sep 2010 21:42:19 +0200, Oleg Nesterov wrote:
 Test-case:

Checked in http://sourceware.org/systemtap/wiki/utrace/tests as step-detach.


Thanks,
Jan



Re: gdbstub initial code, v9

2010-09-09 Thread Jan Kratochvil
Hi Oleg,

kernel-devel-2.6.34.6-54.fc13.x86_64 (real F13) says:

ugdb.c:1988: error: implicit declaration of function ‘hex_to_bin’


Jan



Re: gdbstub initial code, v9

2010-09-09 Thread Jan Kratochvil
On Thu, 09 Sep 2010 18:30:31 +0200, Oleg Nesterov wrote:
 OOPS! indeed, unhex() confuses lo and hi.

It works for 0xcc, though.

 Cough... could you tell me how can I change the variable done
 without printing it?

(gdb) help set variable 
Evaluate expression EXP and assign result to variable VAR, using assignment
syntax appropriate for the current language (VAR = EXP or VAR := EXP for
example).  VAR may be a debugger convenience variable (names starting
with $), a register (a few standard names starting with $), or an actual
variable in the program being debugged.  EXP is any valid expression.
This may usually be abbreviated to simply set.


Regards,
Jan



Re: gdbstub initial code, v7

2010-09-03 Thread Jan Kratochvil
On Thu, 02 Sep 2010 22:06:32 +0200, Oleg Nesterov wrote:
 I assume that qXfer:siginfo:read always mean Hg thread.

It seems so.

 It is not clear to me what should ugdb report if there is no a valid
 siginfo.  linux_xfer_siginfo() return E01, but gdbserver uses SIGSTOP to
 stop the tracee,

I find error more appropriate in such case.

 Likewise, it is not clear what should ugdb do if gdb sends
 $CSIG in this case.

Currently GDB does not do anything special, that is if there is siginfo for
signal SIGUSR1 but one does $C0B (SIGSEGV) does ptrace reset the siginfo or is
left the SIGUSR1 siginfo for SIGSEGV?

 But this all is minor, I think.

As this is being discussed for GDB I would find enough to just make $_siginfo
accessible without these details.


Thanks,
Jan



Re: gdbstub initial code, v7

2010-09-03 Thread Jan Kratochvil
On Fri, 03 Sep 2010 21:59:06 +0200, Roland McGrath wrote:
  Currently GDB does not do anything special, that is if there is siginfo for
  signal SIGUSR1 but one does $C0B (SIGSEGV) does ptrace reset the siginfo or 
  is
  left the SIGUSR1 siginfo for SIGSEGV?
 
 The kernel considers this sloppy behavior on the debugger's part.  If
 you inject a different signal, we expect you should PTRACE_SETSIGINFO
 to something appropriate, or else that you really didn't care about
 the bits being accurate.  If the resumption signal does not match the
 siginfo_t.si_signo, then the kernel resets the siginfo as if the
 debugger had just used kill with the new signal (i.e. si_pid, si_uid
 point to the ptracer).

OK, that seems to me as the best choice.  Sorry I did not test/read it.


Thanks,
Jan



Re: gdbstub initial code, v7

2010-08-31 Thread Jan Kratochvil
On Mon, 30 Aug 2010 21:20:40 +0200, Jan Kratochvil wrote:
 On Mon, 30 Aug 2010 20:58:50 +0200, Oleg Nesterov wrote:
  - report signals. A bit more code changes than I expected.
 
 BTW not sure if it is already the right time for it but to keep ugdb on-par
 with my linux-nat's re-post today (still not accepted in FSF GDB)

That's not true, this functionality needs no gdb/remote.c changes and its
correctnes relies just on ugdb (and it is probably not a problem for ugdb).


 ugdb should support qXfer:siginfo, currently accessible only via $_siginfo
 print/set, though.

Still sure this feature should be also implemented one day.


Thanks,
Jan



Re: gdbstub initial code, v7

2010-08-30 Thread Jan Kratochvil
On Mon, 30 Aug 2010 20:58:50 +0200, Oleg Nesterov wrote:
   - report signals. A bit more code changes than I expected.

BTW not sure if it is already the right time for it but to keep ugdb on-par
with my linux-nat's re-post today (still not accepted in FSF GDB)
[0/9]#2 Fix lost siginfo_t
http://sourceware.org/ml/gdb-patches/2010-08/msg00480.html

ugdb should support qXfer:siginfo, currently accessible only via $_siginfo
print/set, though.


Thanks,
Jan



Re: Q: multiple inferiors, all-stop vCont

2010-08-03 Thread Jan Kratochvil
On Tue, 03 Aug 2010 18:53:59 +0200, Oleg Nesterov wrote:
 On 08/03, Jan Kratochvil wrote:
  On Tue, 03 Aug 2010 16:30:04 +0200, Oleg Nesterov wrote:
   However, I do not really understand how this can work reliably in the
   terms of remote protocol. Somehow this scheme relies on the fact that
   gdb will send another vCont;t:pTGID.-1 _once again_ after the previous
   vCont;t:pTGID.-1, and gdbserver can report the other threads via
   Stop/vStopped. OK, I hope this doesn't matter.
 
  attach_command_post_wait:
 
/* At least the current thread is already stopped.  */
 
/* In all-stop, by definition, all threads have to be already
   stopped at this point.  In non-stop, however, although the
   selected thread is stopped, others may still be executing.
   Be sure to explicitly stop all threads of the process.  This
   should have no effect on already stopped threads.  */
if (non_stop)
  target_stop (pid_to_ptid (inferior-pid));
 
 This just reflects the current situation with the current implementation.
 gdb already did
 
   vAttach;PID
   vCont;t:pPID.-1
 
 I do not see anything in the _documentation_ which could explain that
 only the main thread can be stopped despite the fact -1 means all
 threads.

-1 really means all threads - all those gdbserver knows about that time.

Anyway this double-stop issue is gdbserver/libthread_db specific and offtopic
for ugdb.


 Once again, I already understand why gdb + gdbserver work this way,
 I meant remote protocol in general.

In remote protocol - and even internally in gdbserve - -1 really always
means all the (currently known) threads.


 And in fact, I do not think your explanation is correct. Yes, this
 attach_command_post_wait() is called during attach. But even after that
 gdbserver reports only the main thread. This happens before qSymbol
 stage.

This attach_command_post_wait code is executed after the qSymbol command.

The first single-thread vCont:

#0  putpkt (buf=0x1f348b0 vCont;t:p517.-1) at remote.c:6730
#1  in remote_stop_ns (ptid=...) at remote.c:4709
#2  in remote_stop (ptid=...) at remote.c:4747
#3  in target_stop (ptid=...) at target.c:3031
#4  in attach_command (args=0x7fffd861 1303, from_tty=1) at infcmd.c:2436
#5  in do_cfunc (c=0x1db8bf0, args=0x7fffd861 1303, from_tty=1) at 
./cli/cli-decode.c:67
#6  in cmd_func (cmd=0x1db8bf0, args=0x7fffd861 1303, from_tty=1) at 
./cli/cli-decode.c:1771
#7  in execute_command (p=0x7fffd864 3, from_tty=1) at top.c:422
#8  in catch_command_errors (command=0x48a3e3 execute_command, 
arg=0x7fffd85a attach 1303, from_tty=1, mask=6) at exceptions.c:534
#9  in captured_main (data=0x7fffd360) at ./main.c:887

The second all-threads vCont:

#0  putpkt (buf=0x1f4ecb0 vCont;t:p517.-1) at remote.c:6730
#1  in remote_stop_ns (ptid=...) at remote.c:4709
#2  in remote_stop (ptid=...) at remote.c:4747
#3  in target_stop (ptid=...) at target.c:3031
#4  in attach_command_post_wait (args=0x1f3b6f0 1303, from_tty=1, 
async_exec=0) at infcmd.c:2334
#5  in attach_command_continuation (args=0x1f3b6a0) at infcmd.c:2355
#6  in do_my_cleanups (pmy_chain=0x7fffcd08, old_chain=0x0) at utils.c:421
#7  in do_all_inferior_continuations () at utils.c:692
#8  in inferior_event_handler (event_type=INF_EXEC_COMPLETE, client_data=0x0) 
at inf-loop.c:96
#9  in fetch_inferior_event (client_data=0x0) at infrun.c:2649
#10 in fetch_inferior_event_wrapper (client_data=0x0) at inf-loop.c:169
#11 in catch_errors (func=0x6b4287 fetch_inferior_event_wrapper, 
func_args=0x0, errstring=0xe378dd , mask=6) at exceptions.c:518
#12 in inferior_event_handler (event_type=INF_REG_EVENT, client_data=0x0) at 
inf-loop.c:65
#13 in remote_async_serial_handler (scb=0x1f30b00, context=0x0) at 
remote.c:10317
#14 in push_event (context=0x1f30b00) at ser-base.c:176
#15 in handle_timer_event (dummy=...) at event-loop.c:1306
#16 in process_event () at event-loop.c:399
#17 in gdb_do_one_event (data=0x0) at event-loop.c:452
#18 in catch_errors (func=0x6b0d2a gdb_do_one_event, func_args=0x0, 
errstring=0xe07943 , mask=6) at exceptions.c:518
#19 in tui_command_loop (data=0x0) at ./tui/tui-interp.c:171
#20 in current_interp_command_loop () at interps.c:291
#21 in captured_command_loop (data=0x0) at ./main.c:227
#22 in catch_errors (func=0x47ff66 captured_command_loop, func_args=0x0, 
errstring=0xdc6967 , mask=6) at exceptions.c:518
#23 in captured_main (data=0x7fffd360) at ./main.c:910


 But, it is very possible I missed something. Ang again, I think (I hope ;)
 we can forget this because the simple method works too.

This discussion is really offtopic for ugdb.


 I was afraid there are some other reason why we can't avoid libthread_db.

Roland has correctly pointed out the TLS support.  But that will come later.


 Yes, I do understand vAttach issues, but I thought that attach
 command should always hide these details. From the documentation:
 
   attach PROCESS-ID

Re: gdbstub initial code, another approach

2010-08-02 Thread Jan Kratochvil
On Wed, 28 Jul 2010 20:17:02 +0200, Oleg Nesterov wrote:
   - the testing was very limited. I played with it about
 an hour and didn't find any problems, vut that is all.
[...]
 Btw, gdb crashes very often right after
 
   (gdb) set target-async on
   (gdb) set non-stop
   (gdb) file mt-program
   (gdb) target extended-remote :port
   (gdb) attach its_pid
 
 I didn't even try to investigate (this doesn't happen when
 it works with the real gdbserver). Just retry, gdb is buggy.

Trying it with both /bin/sleep and a threaded testcase and I never got a crash
(kernel-2.6.33.6-147.fc13.x86_64 as both host and KVM guest OS).

$ killall gdbstub;~/redhat/threaditp=$!;~/redhat/gdbstub ~/redhat/outsleep 
0.1;./gdb -nx -ex 'set target-async on' -ex 'set non-stop' -ex file 
$HOME/redhat/threadit -ex 'target extended-remote :2000' -ex attach $p -ex 
'set confirm no';kill $p; 
gdbstub: no process killed
[6] 22822
[7] 22823
GNU gdb (GDB) 7.2.50.20100802-cvs
Copyright (C) 2010 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later http://gnu.org/licenses/gpl.html
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type show copying
and show warranty for details.
This GDB was configured as x86_64-unknown-linux-gnu.
For bug reporting instructions, please see:
http://www.gnu.org/software/gdb/bugs/.
Reading symbols from /home/jkratoch/redhat/threadit...done.
Remote debugging using :2000
Attached to process 22822
[New Thread 22822.22822]
[New Thread 22822.22825]
Reading symbols from /lib64/libpthread.so.0...Reading symbols from 
/usr/lib/debug/lib64/libpthread-2.12.so.debug...done.
done.
Loaded symbols for /lib64/libpthread.so.0
Reading symbols from /lib64/libc.so.6...Reading symbols from 
/usr/lib/debug/lib64/libc-2.12.so.debug...done.
done.
Loaded symbols for /lib64/libc.so.6
Reading symbols from /lib64/ld-linux-x86-64.so.2...Reading symbols from 
/usr/lib/debug/lib64/ld-2.12.so.debug...done.
done.
Loaded symbols for /lib64/ld-linux-x86-64.so.2
0x7fead8db6fbd in pthread_join (threadid=140646633633552, 
thread_return=0x0) at pthread_join.c:89
89  lll_wait_tid (pd-tid);
(gdb) 
[Thread 22822.22825] #2 stopped.
0x7fead8ad6a6d in nanosleep () at ../sysdeps/unix/syscall-template.S:82
82  T_PSEUDO (SYSCALL_SYMBOL, SYSCALL_NAME, SYSCALL_NARGS)
Current language:  auto
The current source language is auto; currently asm.
info threads 
  2 Thread 22822.22825  0x7fead8ad6a6d in nanosleep () at 
../sysdeps/unix/syscall-template.S:82
* 1 Thread 22822.22822  0x7fead8db6fbd in pthread_join 
(threadid=140646633633552, thread_return=0x0) at pthread_join.c:89
(gdb) q
[7]+  Done~/redhat/gdbstub ~/redhat/out
[6]+  Terminated  ~/redhat/threadit


Thanks,
Jan



Re: gdbstub initial code, another approach

2010-07-30 Thread Jan Kratochvil
On Fri, 30 Jul 2010 16:41:24 +0200, Oleg Nesterov wrote:
 IOW, you think that it is better to shift gdbserver into kernel-space than
 port the existing one to the new API or write the new one in user space ?

So far I just assumed kernel-space ugdb is the plan.  As I wrote before I do
not know gdbserver too much.

If you check gdb/gdbserver/linux-low.c it is just one big ptrace/wait/\/proc
interface.  I would guess it could be more simple with the utrace API at hand.

Catching up with systemtap's 200x higher software-watchpoint performance over
current (local) gdb (described in [debug-list] Utrace Discussion Notes off
this list) could be easier with in-kernel gdb I thought.


Thanks,
Jan



Re: clone bug (glibc?) (Was: clone-multi-ptrace test failure)

2009-12-14 Thread Jan Kratochvil
On Tue, 01 Dec 2009 20:39:40 +0100, Roland McGrath wrote:
 I think the best bet is to link with -Wl,-z,now and then minimize the
 library code you rely on.

Checked-in the fix of at least Fedora 12 x86_64 below.

getppid() does not look to be needed there - PTRACE_SYSCALL does stop
(WIFSTOPPED) on the entry (before WIFEXITED) to __NR_exit keeping the
PASS/FAIL reproducibility.


Regards,
Jan


--- Makefile.am 29 Nov 2009 02:23:25 -  1.60
+++ Makefile.am 14 Dec 2009 09:47:54 -  1.61
@@ -111,6 +111,8 @@ stopped_attach_transparency_LDFLAGS = -l
 erestartsys_trap_LDFLAGS = -lutil
 erestartsys_trap_debugger_LDFLAGS = -lutil
 erestartsys_trap_32fails_debugger_LDFLAGS = -lutil
+# After clone syscall it must call no glibc code (such as _dl_runtime_resolve).
+clone_multi_ptrace_LDFLAGS = -Wl,-z,now
 
 check_TESTS = $(SAFE)
 xcheck_TESTS = $(CRASHERS)
--- clone-multi-ptrace.c5 Dec 2008 14:41:57 -   1.6
+++ clone-multi-ptrace.c14 Dec 2009 09:47:54 -  1.7
@@ -65,10 +65,10 @@ static char grandchild_seen[THREAD_NUM];
 static int
 grandchild_func (void *unused)
 {
-  /* Need to have at least one syscall before exit */
-  getppid ();
-  /* _exit() would make ALL threads to exit. We need rew syscall */
+  /* _exit() would make ALL threads to exit.  We need rew syscall.  After the
+ clone syscall it must call no glibc code (such as _dl_runtime_resolve).  
*/
   syscall (__NR_exit, 22);
+
   return 0;
 }
 



Re: [PATCH v2] ptrace-tests: fix step-fork.c on powerpc for ptrace-utrace

2009-12-14 Thread Jan Kratochvil
On Tue, 01 Dec 2009 18:38:27 +0100, Veaceslav Falico wrote:
 Instead of using fork(), call syscall(__NR_fork) in step-fork.c
 to avoid looping on powerpc arch in libc.

Checked-in.  (Not seen any problems with syscall and using glibc afterwards as
in the clone-multi-ptrace.c case so left it as is.)


Regards,
Jan


 Signed-off-by: Veaceslav Falico vfal...@redhat.com
 ---
 
 --- a/ptrace-tests/tests/step-fork.c  2009-12-01 17:17:14.0 +0100
 +++ b/ptrace-tests/tests/step-fork.c  2009-12-01 18:35:15.0 +0100
 @@ -29,6 +29,7 @@
  #include unistd.h
  #include sys/wait.h
  #include string.h
 +#include sys/syscall.h
  #include signal.h
  
  #ifndef PTRACE_SINGLESTEP
 @@ -78,7 +79,12 @@ main (int argc, char **argv)
   sigprocmask (SIG_BLOCK, mask, NULL);
   ptrace (PTRACE_TRACEME);
   raise (SIGUSR1);
 - if (fork () == 0)
 +
 + /*
 +  * Can't use fork() directly because on powerpc it loops inside libc 
 under
 +  * PTRACE_SINGLESTEP. See 
 http://marc.info/?l=linux-kernelm=125927241130695
 +  */
 + if (syscall(__NR_fork) == 0)
 {
   read (-1, NULL, 0);
   _exit (22);



Re: powerpc: step-jump-cont failure (Was: [PATCH] utrace: don't set -ops = utrace_detached_ops lockless)

2009-12-14 Thread Jan Kratochvil
On Wed, 09 Dec 2009 19:12:41 +0100, Oleg Nesterov wrote:
   while the '.func_name' is the text address.
 
  tried to change the code to
 
  REGS_ACCESS (regs, nip) = (unsigned long) .raise_sigusr2
 
  but gcc doesn't like this ;)
...
 Yes, I verified the patch below fixes step-jump-cont.c on
 ibm-js20-02.lab.bos.redhat.com.

Checked-in a similar patch but same as used now in other testcases, sorry for
not using the patch of yours.


Regards,
Jan


--- step-jump-cont.c8 Dec 2008 18:23:41 -   1.12
+++ step-jump-cont.c14 Dec 2009 11:38:37 -  1.13
@@ -213,6 +213,24 @@ int main (void)
   REGS_ACCESS (regs, eip) = (unsigned long) raise_sigusr2;
 #elif defined __x86_64__
   REGS_ACCESS (regs, rip) = (unsigned long) raise_sigusr2;
+#elif defined __powerpc64__
+  {
+/* ppc64 `raise_sigusr2' resolves to the function descriptor.  */
+union
+  {
+   void (*f) (void);
+   struct
+ {
+   void *entry;
+   void *toc;
+ }
+   *p;
+  }
+const func_u = { raise_sigusr2 };
+
+REGS_ACCESS (regs, nip) = (unsigned long) func_u.p-entry;
+REGS_ACCESS (regs, gpr[2]) = (unsigned long) func_u.p-toc;
+  }
 #elif defined __powerpc__
   REGS_ACCESS (regs, nip) = (unsigned long) raise_sigusr2;
 #else



Re: Tests Failures on PPC64

2009-12-14 Thread Jan Kratochvil
On Wed, 09 Dec 2009 19:31:52 +0100, Oleg Nesterov wrote:
 Hmm. it is obvioulsy racy, static volatile unsigned started
 is not atomic and thus the main thread can hang doing
 
   while (started  THREADS);
 
 not that I think this explains the failure though.

Thanks, fixed (but the problem is not reproducible for me).


Regards,
Jan


--- ppc-dabr-race.c 8 Dec 2008 18:23:41 -   1.8
+++ ppc-dabr-race.c 14 Dec 2009 12:03:49 -  1.9
@@ -141,13 +141,14 @@ handler_fail (int signo)
   assert (0);
 }
 
+/* STARTED requires atomic access.  */
 static volatile unsigned started;
 
 static void *child_thread (void *data)
 {
   pid_t tid = gettid ();
 
-  started++;
+  __sync_add_and_fetch (started, 1);
 
   /* We should stay in the syscall - better race probability.  */
   sleep (1);
@@ -178,7 +179,7 @@ static void child_func (void)
   assert (i == 0);
 }
 
-  while (started  THREADS);
+  while (__sync_add_and_fetch (started, 0)  THREADS);
 
   l = ptrace (PTRACE_TRACEME, 0, NULL, NULL);
   assert (l == 0);



Re: step-into-handler.c compilation failure on ppc64

2009-12-13 Thread Jan Kratochvil
On Sat, 05 Dec 2009 18:19:20 +0100, Roland McGrath wrote:
 How about this?
 
 --- step-into-handler.c   10 Dec 2008 04:42:43 -0800  1.8
 +++ step-into-handler.c   05 Dec 2009 09:18:54 -0800  
[...]
 @@ -113,11 +114,11 @@ handler_alrm_get (void)
  {
  #if defined __powerpc64__
/* ppc64 `handler_alrm' resolves to the function descriptor.  */
 -  return *(void **) handler_alrm;
 +  return *(void **) (uintptr_t) handler_alrm;
  /* __s390x__ defines both the symbols.  */
  #elif defined __s390__  !defined __s390x__
/* s390 bit 31 is zero here but I am not sure if it cannot be arbitrary.  
 */
[...]

On Sat, 05 Dec 2009 18:39:05 +0100, CAI Qian wrote:
 Thanks. Fixed.

I have to say it did not help for me (gcc-4.4.2-7.el6.ppc64).
error: dereferencing type-punned pointer will break strict-aliasing 
rules

Checked-in the union-based fix below (both tests PASS on ppc64).


Regards,
Jan

--- erestartsys.c   27 Nov 2009 22:50:31 -  1.13
+++ erestartsys.c   14 Dec 2009 00:38:42 -  1.14
@@ -38,6 +38,7 @@
 #include stddef.h
 #include pty.h
 #include string.h
+#include stdint.h
 
 #if defined __x86_64__
 # define REGISTER_IP .rip
@@ -298,8 +299,23 @@ main (int argc, char **argv)
   user = user_orig;
   user REGISTER_IP = (unsigned long) func;
 #ifdef __powerpc64__
-  user.nip = ((const unsigned long *) func)[0]; /* entry */
-  user.gpr[2] = ((const unsigned long *) func)[1]; /* TOC */
+  {
+/* ppc64 `func' resolves to the function descriptor.  */
+union
+  {
+   void (*f) (void);
+   struct
+ {
+   void *entry;
+   void *toc;
+ }
+   *p;
+  }
+const func_u = { func };
+
+user.nip = (uintptr_t) func_u.p-entry;
+user.gpr[2] = (uintptr_t) func_u.p-toc;
+  }
 #endif
   /* GDB amd64_linux_write_pc():  */
   /* We must be careful with modifying the program counter.  If we
--- step-into-handler.c 8 Dec 2008 18:23:41 -   1.8
+++ step-into-handler.c 14 Dec 2009 00:38:42 -  1.9
@@ -113,7 +113,19 @@ handler_alrm_get (void)
 {
 #if defined __powerpc64__
   /* ppc64 `handler_alrm' resolves to the function descriptor.  */
-  return *(void **) handler_alrm;
+  union
+{
+  void (*f) (int signo);
+  struct
+   {
+ void *entry;
+ void *toc;
+   }
+  *p;
+}
+  const func_u = { handler_alrm };
+
+  return func_u.p-entry;
 /* __s390x__ defines both the symbols.  */
 #elif defined __s390__  !defined __s390x__
   /* s390 bit 31 is zero here but I am not sure if it cannot be arbitrary.  */



Re: powerpc: step-jump-cont failure (Was: [PATCH] utrace: don't set -ops = utrace_detached_ops lockless)

2009-12-07 Thread Jan Kratochvil
On Mon, 07 Dec 2009 15:24:51 +0100, Oleg Nesterov wrote:
 But. raise_sigusr2 is not equal to the actual address of raise_sigusr2(),
 this value points to the thunk (I do not know the correct English term)

ppc64 calls it function descriptor (GDB
ppc64_linux_convert_from_func_ptr_addr):
   For PPC64, a function descriptor is a TOC entry, in a data section,
   which contains three words: the first word is the address of the
   function, the second word is the TOC pointer (r2), and the third word
   is the static chain value.

(gdb) x/8gx 0x805b6f6258
0x805b6f6258 open:0x00805b65cf68  0x00805b702ac0
0x805b6f6268 open64:  0x00805b65d010  0x00805b702ac0

(gdb) x/20i 0x00805b65cf68
0x805b65cf68 .__GI___open:lwz r10,-30432(r13)
0x805b65cf6c .__GI___open+4:  cmpwi   r10,0
0x805b65cf70 .__GI___open+8:  bne-0x805b65cf84 .__GI___open+28

(gdb) info sym 0x00805b702ac0
last_nip in section .bss

I was not aware there is any third word before and I do not see it there.


Regards,
Jan



Re: utrace-ptrace gdb testsuite tesults

2009-11-29 Thread Jan Kratochvil
On Wed, 25 Nov 2009 23:30:37 +0100, Jan Kratochvil wrote:
   Please point at some built or easily buildable kernel .rpm first.
  
  http://kojipkgs.fedoraproject.org/scratch/roland/task_1825649/
 
 OK, taken for reverification.

Followed the differences found by Qian and verified none of them (did not
verify the ppc suspicious one) has any regression in GDB testsuite.


Regards,
Jan



Re: utrace-ptrace gdb testsuite tesults

2009-11-29 Thread Jan Kratochvil
On Sun, 29 Nov 2009 23:39:59 +0100, Jan Kratochvil wrote:
 Followed the differences found by Qian and verified none of them (did not
 verify the ppc suspicious one) has any regression in GDB testsuite.

Forgot the log FYI.


Regards,
Jan


-result-2.6.31.5-127.fc12.x86_64/gdb
+result-2.6.32-0.53.rc8.496.fc13.x86_64/gdb

*attach*stop* generally unchecked, it should be covered by ptrace-testsuite and
the GDB testcases are currently racy.

-FAIL: gdb.base/follow-child.exp: break
+PASS: gdb.base/follow-child.exp: break
= unstable testcase, RH-specific, dropped as redundant to other 
testcases

/root/jkratoch/redhat/gdb-7.0-3.fc12.src/gdb-7.0-m64/gdb/testsuite
-PASS: gdb.threads/attachstop-mt.exp: attach4 stop by interrupt
-PASS: gdb.threads/attachstop-mt.exp: attach4, exit leaves process sleeping
+FAIL: gdb.threads/attachstop-mt.exp: attach4 stop by interrupt (timeout)
+FAIL: gdb.threads/attachstop-mt.exp: attach4, exit leaves process sleeping
= racy, ignored

-PASS: gdb.base/foll-fork.exp: default parent follow, no catchpoints
+FAIL: gdb.base/foll-fork.exp: (timeout) default parent follow, no catchpoints
= racy, fixed the testcase upstream

/root/jkratoch/redhat/gdb-7.0-3.fc12.src/gdb-7.0-m64/gdb/testsuite.unix.-m32
-FAIL: gdb.threads/attach-stopped.exp: threaded: attach1, exit leaves process 
stopped
+PASS: gdb.threads/attach-stopped.exp: threaded: attach1, exit leaves process 
stopped
= racy, ignored

-FAIL: gdb.base/interrupt.exp: continue
+PASS: gdb.base/interrupt.exp: continue
 FAIL: gdb.base/interrupt.exp: echo data (timeout)
 ERROR: Undefined command .
 UNRESOLVED: gdb.base/interrupt.exp: Send Control-C, second time
 FAIL: gdb.base/interrupt.exp: signal SIGINT (the program is no longer running)
-FAIL: gdb.base/interrupt.exp: echo more data (timeout)
-FAIL: gdb.base/interrupt.exp: send end of file
+PASS: gdb.base/interrupt.exp: echo more data
+FAIL: gdb.base/interrupt.exp: send end of file (eof)
= both kernels behave the same - correctly, updated erestart* tests 
set, for x86_64-x86_64-i386 (kernel-debugger-inferior) GDB needs a fix: 
http://sourceware.org/ml/gdb-patches/2009-11/msg00592.html

-FAIL: gdb.server/ext-run.exp: get process list
+PASS: gdb.server/ext-run.exp: get process list
= upstream gdbserver data corruption

-FAIL: gdb.java/jnpe.exp: next over NPE
+PASS: gdb.java/jnpe.exp: next over NPE
= fixed the testcase in archer

/root/jkratoch/redhat/gdb-7.0-3.fc12.src/gdb-7.0-m32/gdb/testsuite.unix.-m32
-ERROR: Couldn't send info inferior 16 to GDB.
-UNRESOLVED: gdb.base/multi-forks.exp: Did kill 16
+PASS: gdb.base/multi-forks.exp: Run to exit 11
= always ignored by me, IMO racy

-PASS: gdb.threads/attachstop-mt.exp: attach4 stop by interrupt
-PASS: gdb.threads/attachstop-mt.exp: attach4, exit leaves process sleeping
+FAIL: gdb.threads/attachstop-mt.exp: attach4 stop by interrupt (timeout)
+FAIL: gdb.threads/attachstop-mt.exp: attach4, exit leaves process sleeping
= racy, ignored

-FAIL: gdb.cp/constructortest.exp: running to main in runto
 PASS: gdb.cp/constructortest.exp: breaking on A::A
-FAIL: gdb.cp/constructortest.exp: continue to breakpoint: First line A
-FAIL: gdb.cp/constructortest.exp: Verify in in-charge A::A
-FAIL: gdb.cp/constructortest.exp: continue to breakpoint: First line A
-FAIL: gdb.cp/constructortest.exp: Verify in not-in-charge A::A
+PASS: gdb.cp/constructortest.exp: continue to breakpoint: First line A
+PASS: gdb.cp/constructortest.exp: Verify in in-charge A::A
+PASS: gdb.cp/constructortest.exp: continue to breakpoint: First line A
+PASS: gdb.cp/constructortest.exp: Verify in not-in-charge A::A
-FAIL: gdb.pie/break.exp: run until function breakpoint
-FAIL: gdb.pie/break.exp: run until breakpoint set at a line number
-FAIL: gdb.pie/break.exp: run until file:function(6) breakpoint
-FAIL: gdb.pie/break.exp: run until file:function(5) breakpoint (the program is 
no longer running)
-FAIL: gdb.pie/break.exp: run until file:function(4) breakpoint (the program is 
no longer running)
-FAIL: gdb.pie/break.exp: run until file:function(3) breakpoint (the program is 
no longer running)
-FAIL: gdb.pie/break.exp: run until file:function(2) breakpoint (the program is 
no longer running)
-FAIL: gdb.pie/break.exp: run until file:function(1) breakpoint (the program is 
no longer running)
-FAIL: gdb.pie/break.exp: run until quoted breakpoint (the program is no longer 
running)
-FAIL: gdb.pie/break.exp: run until file:linenum breakpoint (the program is no 
longer running)
-FAIL: gdb.pie/break.exp: breakpoint offset +1
-FAIL: gdb.pie/break.exp: step onto breakpoint (the program is no longer 
running)
+PASS: gdb.pie/break.exp: run until function breakpoint
+PASS: gdb.pie/break.exp: run until breakpoint set at a line number
+PASS: gdb.pie/break.exp: run until file:function(6) breakpoint
+PASS: gdb.pie/break.exp: run until file:function(5) breakpoint
+PASS: gdb.pie/break.exp: run until file:function(4) breakpoint
+PASS: gdb.pie

Re: utrace-ptrace gdb testsuite tesults

2009-11-27 Thread Jan Kratochvil
On Fri, 27 Nov 2009 15:11:09 +0100, Veaceslav Falico wrote:
 -FAIL: gdb.base/foll-fork.exp: unpatch child, unpatched parent breakpoints 
 from child (timeout)
 +PASS: gdb.base/foll-fork.exp: unpatch child, unpatched parent breakpoints 
 from child
 -PASS: gdb.base/foll-fork.exp: set follow parent, hit tbreak
 +FAIL: gdb.base/foll-fork.exp: (timeout) set follow parent, hit tbreak

To be ignored, fixed upstream:
http://sourceware.org/ml/gdb-patches/2009-11/msg00573.html


 -PASS: gdb.mi/mi-nsmoribund.exp: resume all, program exited normally
 +FAIL: gdb.mi/mi-nsmoribund.exp: unexpected stop
 -KFAIL: gdb.threads/watchthreads2.exp: gdb can drop watchpoints in 
 multithreaded app (PRMS: gdb/10116)
 +PASS: gdb.threads/watchthreads2.exp: all threads incremented x

These are known to be unstable but there some known watch and non-stop
problems so it may not even be a testcase-side bug.


Therefore this test shows no changes/regressions.


Regards,
Jan



Re: utrace-ptrace gdb testsuite tesults

2009-11-27 Thread Jan Kratochvil
On Fri, 27 Nov 2009 15:34:05 +0100, Oleg Nesterov wrote:
 Jan, if you see something particular which needs more attention or should
 be fixed, please let me know. I'll try to investigate then.

I am still not finished with the verifications yesterday but so far no kernel
behavior change has been proven and I doubt it will be.  Going to reply today.

The ppc kernel should be checked but I do not have built two non-utrace/utrace
matching kernel rpms for it.


Regards,
Jan



Re: GDB Testsuite Results with CONFIG_UTRACE i686

2009-11-25 Thread Jan Kratochvil
Hi,

the gdb.pie/break.exp change would be worth checking more but this is based on
the old PIE patch with various known problems and for RHEL-6 there will be
a different/new PIE patch implementation.

Also the gdb.base/bigcore.exp and gdb.base/follow-child.exp changes would be
worth checking if the change is stable across multiple runs of the specific
testcase.

You can also check gdb.log differences, sometimes it is apparent the change is
OK.  Otherwise if the change is stable across multiple runs and it is not
obvious to you why it did change as you already have the machine ready could
you please provide the hostname/password/etc. there?


Thanks,
Jan



Re: GDB Testsuite Results on POWERPC

2009-11-25 Thread Jan Kratochvil
On Wed, 25 Nov 2009 09:59:11 +0100, Ananth N Mavinakayanahalli wrote:
 Essentially, there is *no* change in any of the numbers with and without
 ptrace over utrace.

While it is probable so please rather check diff of the *.sum files as some of
the results are fuzzy and - in a rare possibility - two results changing
FAIL-PASS and PASS-FAIL will not show in this summary.


Thanks,
Jan



Re: utrace-ptrace gdb testsuite tesults

2009-11-25 Thread Jan Kratochvil
On Wed, 25 Nov 2009 22:17:15 +0100, Roland McGrath wrote:
  In general everything where is a word thread has unstable results and
  nonstop tests are also a bit unstable.
 
 So where exactly is the problem in these cases?  Are the tests overly
 timing-sensitive where there is no actual behavior bug?  Or is gdb overly
 timing-sensitive where there is no actual kernel bug?  Or is it just
 unknown, and might be a kernel bug after all (even an undiagnosed one in
 vanilla kernels)?

gdb.server/server-run.exp: gdbserver contains data overflow/corruption,
   occasionally it crashes, occasionally passes.
gdb.mi/mi-nonstop-exit.exp: Some race in GDB non-stop code.
gdb.threads/attach-stopped.exp: Race in the testcase (I think so).
etc.

But in most cases I do not know, gdb.log is commonly not enough to find the
problem and when it is not reproducible on the 2nd..nth run... But I+upstream
already caught many races but still a lot of them remains.


  There are IMO/hopefully very few cases tested by the gdb testsuite and still
  not covered by the ptrace-testsuite, I even do not much expect we will see
  again a new utrace regression caught by the gdb testsuite  uncaught by the
  ptrace-testsuite.
 
 That's certainly good to hear.  If you are pretty confident about that,
 then I am quite happy to consider nonregression on all of ptrace-tests the
 sole gating test for kernel changes.  We just don't want to wind up having
 other upstream reviewers notice a regression using gdb that we didn't
 notice before we submitted a kernel change.

I did not verify the GDB codebase for all the ptrace calls in any way.

If it is a kernel patch submit after long development period it is probably
still worth checking it against GDB.


  Please point at some built or easily buildable kernel .rpm first.
 
 http://kojipkgs.fedoraproject.org/scratch/roland/task_1825649/

OK, taken for reverification.


Regards,
Jan



Re: [PATCH 1-13] utrace-ptrace V1, for internal review

2009-11-25 Thread Jan Kratochvil
On Tue, 24 Nov 2009 12:31:41 +0100, Srikar Dronamraju wrote:
 When I get the latest set of ptrace-tests by using. 
 cvs -d :pserver:anoncvs:anon...@sources.redhat.com:/cvs/systemtap co 
 ptrace-tests

 1. Am I using the right source of ptrace-tests or has its location
 changed.

It is right, webpage at:
http://sourceware.org/systemtap/wiki/utrace/tests


 2. Are these new testcases x86 architecture specific? 

step-from-clone + syscall-from-clone

Fixed/checked-in, now they SKIP (rc 77) on unsupported arches.

New support for ppc/ppc64 PASSes.

New support for s390/s390x FAILs (kernel-2.6.18-164.6.1.el5.s390x).
orig_gpr2 seems to be errorneously set to the retval (gprs[2]).


 3. Shouldn't arch/powerpc/include/asm/user.h not define
 user_regs_struct?

Not sure why but ppc uses `struct pt_regs'.


Thanks,
Jan



Re: utrace-ptrace detach with signal semantics

2009-10-10 Thread Jan Kratochvil
On Sat, 10 Oct 2009 18:24:21 +0200, Oleg Nesterov wrote:
 On 10/06, Jan Kratochvil wrote:
 
  Yes, I agree with the current general behavior of ptrace there is missing:
 if (WIFSTOPPED (status)  WSTOPSIG (status) == SIGPIPE)
 
  [...snip...]
 
  --- attach-into-signal.c31 Jan 2009 21:11:40 -  1.5
  +++ attach-into-signal.c6 Oct 2009 14:27:08 -
  @@ -224,6 +224,18 @@ static void reproduce (void)
 child = 0;
 return;
   }
  +  /* SIGPIPE was still pending and it has not been yet delivered.  */
  +  if (WIFSTOPPED (status)  WSTOPSIG (status) == SIGPIPE)
  +{
  +  /* Deliver it and get the queued SIGSTOP.  */
  +  errno = 0;
  +  ptrace (PTRACE_CONT, child, (void *) 1, (void *) SIGPIPE);
  +  assert_perror (errno);
  +
  +  errno = 0;
  +  pid = waitpid (child, status, 0);
  +  assert (pid == child);
  +}
 assert (WIFSTOPPED (status));
 assert (WSTOPSIG (status) == SIGSTOP);
 /* let tracee run. it must be killed very soon by SIGPIPE */
 
 Jan, please revert this change.

Reverted (VERBOSE-caught with FAIL now).


Jan



Re: utrace-ptrace detach with signal semantics

2009-10-10 Thread Jan Kratochvil
On Sat, 10 Oct 2009 18:48:29 +0200, Oleg Nesterov wrote:
 On 10/10, Jan Kratochvil wrote:
  (VERBOSE-caught with FAIL now).
 
 Cough. please translate this to me ;)

(Cc of each such mail to Roland does not look OK to me but removing Ccs is
also not OK)


--- tests/attach-into-signal.c  6 Oct 2009 19:21:35 -   1.6
+++ tests/attach-into-signal.c  10 Oct 2009 16:47:46 -  1.7
@@ -227,14 +227,8 @@ static void reproduce (void)
   /* SIGPIPE was still pending and it has not been yet delivered.  */
   if (WIFSTOPPED (status)  WSTOPSIG (status) == SIGPIPE)
 {
-  /* Deliver it and get the queued SIGSTOP.  */
-  errno = 0;
-  ptrace (PTRACE_CONT, child, (void *) 1, (void *) SIGPIPE);
-  assert_perror (errno);
-
-  errno = 0;
-  pid = waitpid (child, status, 0);
-  assert (pid == child);
+  VERBOSE (Forbidden to catch pending signal from PTRACE_DETACH);
+  exit (1);
 }
   assert (WIFSTOPPED (status));
   assert (WSTOPSIG (status) == SIGSTOP);



Re: [PATCH 83] ptrace(DETACH, SIGKILL) should really kill the tracee

2009-10-10 Thread Jan Kratochvil
On Sat, 10 Oct 2009 18:17:12 +0200, Oleg Nesterov wrote:
 Roland, Jan, what user-space expects ptrace(DETACH, SIGKILL) should do?
 
 My guess: this should really kill the tracee asap, hence this patch.

attached testcase works for me on both:
kernel-2.6.31.1-48.fc12.x86_64
kernel-2.6.30.5-43.fc11.x86_64

does it FAIL for you to make it worth to the testsuite?


Thanks,
Jan
/* Test case for (PTRACE_DETACH, SIGKILL) really does kill the tracee.

   This software is provided 'as-is', without any express or implied
   warranty.  In no event will the authors be held liable for any damages
   arising from the use of this software.

   Permission is granted to anyone to use this software for any purpose,
   including commercial applications, and to alter it and redistribute it
   freely.  */

#define _GNU_SOURCE 1
#include assert.h
#include unistd.h
#include sys/wait.h
#include sys/ptrace.h
#include stdio.h
#include stdlib.h
#include stddef.h
#include errno.h

static pid_t child;

static void
cleanup (void)
{
  if (child  0)
kill (child, SIGKILL);
  child = 0;
}

static void
handler_fail (int signo)
{
  cleanup ();
  signal (signo, SIG_DFL);
  raise (signo);
}

int
main (void)
{
  pid_t got_pid;
  int status;
  long l;

  atexit (cleanup);
  signal (SIGABRT, handler_fail);
  signal (SIGINT, handler_fail);

  child = fork ();
  switch (child)
{
case -1:
  assert_perror (errno);

case 0:
  l = ptrace (PTRACE_TRACEME, 0, NULL, NULL);
  assert (l == 0);

  raise (SIGUSR1);
  _exit (42);

default:
  break;
}

  got_pid = waitpid (child, status, 0);
  assert (got_pid == child);
  assert (WIFSTOPPED (status));
  assert (WSTOPSIG (status) == SIGUSR1);

  errno = 0;
  l = ptrace (PTRACE_DETACH, child, NULL, (void *) (long) SIGKILL);
  assert_perror (errno);
  assert (l == 0);

  got_pid = waitpid (child, status, 0);
  assert (got_pid == child);
  assert (WIFSIGNALED (status));
  assert (WTERMSIG (status) == SIGKILL);

  return 0;
}


Re: utrace-ptrace detach with signal semantics

2009-10-07 Thread Jan Kratochvil
On Wed, 07 Oct 2009 15:33:49 +0200, Oleg Nesterov wrote:
 On 10/06, Jan Kratochvil wrote:
  It should work also for PTRACE_SINGLESTEP.
 
 Heh. Yes, but with one exception.
 
   - the tracee has a hanlder for, say, SIGHUP
 
   - the tracee deques SIGHUP, reports to the tracer,
 and stops.
 
   - the tracer does ptrace(SINGLESTEP, SIGHUP)
 
 // it could use another signr, this works.
 // but the tracer must have a handler or
 // everething is OK.
 
   - the tracee delivers SIGHUP to itself, handle_signal()
 notices TIF_SINGLESTEP and calls ptrace_notify().
 
 Now, the tracee reports SIGTRAP, but the next time the tracer does
 ptrace(WHATEVER, SIGNR) SIGNR will be ignored.

OK, this is really a border case I did not mean.

In such case the SIGHUP handler still is not fully execting and as the
non-realtime signals do not nest (count) it is OK it gets activated only once.


I did mean some more normal case of:

ptrace (PTRACE_SIGNALSTEP, 0) = 0
waitpid() = SIGTRAP
ptrace (PTRACE_SIGNALSTEP, 0) = 0
waitpid() = SIGTRAP
PTRACE (PTRACE_DETACH, SIGSTOP) = 0

which I assume it will work.


Thanks,
Jan



Re: utrace-ptrace detach with signal semantics

2009-10-06 Thread Jan Kratochvil
On Tue, 06 Oct 2009 15:10:10 +0200, Oleg Nesterov wrote:
 On 10/06, Jan Kratochvil wrote:
  On Mon, 05 Oct 2009 21:00:37 +0200, Oleg Nesterov wrote:
   On 10/05, Jan Kratochvil wrote:
Naive programs expect the first signal after PTRACE_ATTACH will be 
SIGSTOP.
  
   They should not, this is just wrong.
 
  That may be a right point but such programs are in use out there.  Sure if 
  it
  would be a real difficulty one can keep it as-is as GDB-7.0 soon to be
  released has it already fixed, strace works with it.  Still ltrace crashes 
  the
  inferior in such case.
 
 Confused. Do you mean we should fix the kernel to match this
 expectation?

Yes, I was thinking it would be a good idea.

 This was never true.

I agree.

 You attached the test-case which sends SIGALRM to itself in a loop.
 If the tracer attaches to this program, it is very possible that
 SIGALRM will be reported, not SIGSTOP.

Yes.  And the real world tracers do not expect so.


   This test-case also does:
  
 /* detach with SIGPIPE/attach. This should kill tracee */
 ptrace (PTRACE_DETACH, child, (void *) 1, (void *) SIGPIPE);
  
 ptrace (PTRACE_ATTACH, child, (void *) 0, (void *) 0);
  
 waitpid (child, status, 0);
 assert (WIFSIGNALED (status)  WTERMSIG (status) == SIGPIPE);
  
   It fails if the second PTRACE_ATTACH sees SIGPIPE. This is what
   I can't understand.

Second keyword PTRACE_ATTACH is on line 167.
First keyword SIGPIPE is on line 199.
Line 167 cannot see anything from line 199.

Assuming you did mean third PTRACE_ATTACH.

Line 222:  if (WIFSIGNALED (status)  WTERMSIG (status) == SIGPIPE)
Line 227:  assert (WIFSTOPPED (status));
Line 228:  assert (WSTOPSIG (status) == SIGSTOP);

Yes, I agree with the current general behavior of ptrace there is missing:
   if (WIFSTOPPED (status)  WSTOPSIG (status) == SIGPIPE)


 It fails on ptrace-over-utrace.

Attached, I do not have kernel.rpm with ptrace-over-utrace ready, please
verify it and check it in (or ping me or so).


 Once again. Suppose that the tracer does ptrace(PTRACE_DETACH, SIGXXX).
 Currently, if the next thacer attaches right after this detach it has no
 way to intercept SIGXXX, it will be never reported via ptrace_signal().

No matter if it gets reported to the new tracer still SIGXXX should never get
lost.  If it is not reported to the new tracer then it will be always
processed by the tracee, is it right?


 Is this really important? Do you know any application which can be
 broken if we change this behaviour? With the current utrace-ptrace
 implementation SIGXXX can be reported to the new tracer.

I think there is no application which would handle non-SIGSTOP as the first
signal after PTRACE_ATTACH while it would get confused by getting non-SIGSTOP
signal as the first one after PTRACE_ATTACH after PTRACE_DETACH.


 OK, this relates to the first signal should be SIGSTOP but this
 is wrong anyway, and the case above is very unlikely.

If the first signal should be SIGSTOP is not satisfied (current state)
I think you can freely change this behavior whether SIGXXX will be reported to
the new tracer and we should apply the attached ptrace-testsuite patch.

If the first signal should be SIGSTOP gets fixed/satisfied (proposed state)
I think it is clear SIGXXX from PTRACE_DETACH must not be lost and it must not
be visible as the first signal after PTRACE_ATTACH.  In such case the
ptrace-testsuite testcase attach-into-signal should be simplified a lot to
always just require SIGSTOP as the first signal after PTRACE_ATTACH and the
attached change gets irrelevant in such case.


Thanks,
Jan
--- attach-into-signal.c31 Jan 2009 21:11:40 -  1.5
+++ attach-into-signal.c6 Oct 2009 14:27:08 -
@@ -224,6 +224,18 @@ static void reproduce (void)
   child = 0;
   return;
 }
+  /* SIGPIPE was still pending and it has not been yet delivered.  */
+  if (WIFSTOPPED (status)  WSTOPSIG (status) == SIGPIPE)
+{
+  /* Deliver it and get the queued SIGSTOP.  */
+  errno = 0;
+  ptrace (PTRACE_CONT, child, (void *) 1, (void *) SIGPIPE);
+  assert_perror (errno);
+
+  errno = 0;
+  pid = waitpid (child, status, 0);
+  assert (pid == child);
+}
   assert (WIFSTOPPED (status));
   assert (WSTOPSIG (status) == SIGSTOP);
   /* let tracee run. it must be killed very soon by SIGPIPE */


Re: utrace-ptrace detach with signal semantics

2009-10-06 Thread Jan Kratochvil
On Tue, 06 Oct 2009 19:14:28 +0200, Oleg Nesterov wrote:
 On 10/06, Jan Kratochvil wrote:
[...]
  If it is not reported to the new tracer then it will be always
  processed by the tracee, is it right?
 
 Yes, sure.
 
 (But, just in case... if the tracer does ptrace(DETACH, SIGNR), this
  signr only matters if the tracee was stopped after reporting syscall
  or signal, otherwise SIGNR is ignored).

In which specific cases SIGNR can get ignored?

Whole PTRACE_DETACH will be ignored if the tracee is not stopped.
It SIGNR will be proabably ignored if the tracee is now dead.
Otherwise SIGNR should get delivered, shouldn't it?


  +  /* SIGPIPE was still pending and it has not been yet delivered.  */
  +  if (WIFSTOPPED (status)  WSTOPSIG (status) == SIGPIPE)
[...]
 Yes, I didn't verify this yet, but I think with this patch the
 test-case should succeed with utrace-ptrace kernel.

Checked-in.


Thanks,
Jan



Re: utrace-ptrace detach with signal semantics

2009-10-06 Thread Jan Kratochvil
On Tue, 06 Oct 2009 22:05:16 +0200, Oleg Nesterov wrote:
 For example, the tracee reports PTRACE_EVENT_EXEC and stops. In this
 case SIGNR has no effect after PTRACE_CONT/DETACH/etc.
 
 SIGNR does not ignored after the tracee reported syscall entry/exit or
 signal.

OK, if only such exceptional cases as PTRACE_EVENT_EXEC that should not matter
I think.  It should work also for PTRACE_SINGLESTEP.


Thanks,
Jan



Re: Stopped detach/attach status

2009-10-05 Thread Jan Kratochvil
On Mon, 05 Oct 2009 04:32:08 +0200, Oleg Nesterov wrote:
 On 10/01, Jan Kratochvil wrote:
 
  the ptrace-testsuite
  http://sourceware.org/systemtap/wiki/utrace/tests
 
  currently FAILs (also) on Fedora 12 kernel-2.6.31.1-48.fc12.x86_64 for:
  FAIL: detach-stopped
  FAIL: stopped-attach-transparency
[...]
 As for user-space, I don't really understand the second test-case,
 this again means I don't understand the supposed behaviour.

The high level goal is described at its top.  Users expect that if they run
`gstack PID' or `gcore PID' the target PID will be absolutely in the same
state as before gstack/gcore.

That means it will keep both whether it was / was not stopped and also any
possible existing / non-existing pending signal for a possible future
waitpid() from its real (non-ptrace) parent PID.

Another question whether technically what it does is right but this high level
goal is hopefully valid.


 Except, stopped-attach-transparency prints
 
   Excessive waiting SIGSTOP after the second attach/detach
 
 afaics the test-case is not right here. attach_detach() leaves the
 traced threads in STOPPED state, why pid_notifying_sigstop() should
 fail?

[ Not replying this part, have not built a kernel with this patch now.  ]


 In this case, I don't understand why stopped-attach-transparency
 sends SIGSTOP to every sub-thread. If the tracer wants to stop
 the thread group after detach, it can do
 
   ptrace(PTRACE_DETACH, anythread, SIGSTOP);
   for_each_other_thread(pid)
   ptrace(PTRACE_DETACH, anythread, 0);
 
 or just
 
   kill(SIGSTOP);
   for_each_thread(pid)
   ptrace(PTRACE_DETACH, anythread, 0);

OK, it this is the recommended way I can fix the testcase this way.
The all-threads-being-sent-SIGSTOP way IIRC worked on linux-2.6.9 but I do not
think this part of the compatibility must be kept.



Thanks,
Jan



Stopped detach/attach status

2009-10-01 Thread Jan Kratochvil
Hi Oleg,

the ptrace-testsuite
http://sourceware.org/systemtap/wiki/utrace/tests

currently FAILs (also) on Fedora 12 kernel-2.6.31.1-48.fc12.x86_64 for:
FAIL: detach-stopped
FAIL: stopped-attach-transparency

Do you agree with the testcases and is it planned to fix them for F12?


Thanks,
Jan



Re: Q: what user_enable_single_step() actually means?

2009-09-23 Thread Jan Kratochvil
On Wed, 23 Sep 2009 02:36:54 +0200, Roland McGrath wrote:
 It would be worthwhile to cons a version of this test case that uses
 PTRACE_SINGLESTEP instead of PTRACE_SYSCALL.  I think your situation
 is tickling the same issue, but we should have an empirical test.
[...]
 I have a fix in hand that I'll send upstream before too long.  But perhaps
 it should wait for the PTRACE_SINGLESTEP version of the test case.

Seeing you already added one yourself.

2009-09-23 05:31  roland

* tests/: Makefile.am (1.56), step-from-clone.c (1.1):

Add step-from-clone test.


Regards,
Jan



Re: [PATCH 38] make sure PTRACE_CONT disables SYSCALL_EXIT report

2009-09-18 Thread Jan Kratochvil
On Fri, 18 Sep 2009 00:17:24 +0200, Roland McGrath wrote:
 For any test case you found useful, please add it to the ptrace-tests
 suite.  Jan can help you get it in the right form and get it committed.

Checked-in as:

http://sources.redhat.com/cgi-bin/cvsweb.cgi/~checkout~/tests/ptrace-tests/tests/sigint-before-syscall-exit.c?cvsroot=systemtap


 Or if he would prefer not to keep maintaining that suite, we can get you
 set up on sourceware so you can do it yourself.

With current utrace it is no longer a fulltime assignment so it is OK this way.


Thanks,
Jan



Re: attach-wait-on-stopped vs detach-stopped

2008-08-08 Thread Jan Kratochvil
Hi Roland,

thanks for your detailed explanation making the complex problem looking easy.


On Fri, 08 Aug 2008 07:46:35 +0200, Roland McGrath wrote:
 In the latest upstream kernels, detach-stopped is the only ptrace-tests
 case failing.  A fix I tried for that worked, but made attach-wait-on-stopped
 start failing instead.
 
 Can you tell me if you think the expectation in attach-wait-on-stopped 
 really seems correct?  It seems to be contrary to what detach-stopped wants.
 
 In attach-wait-on-stopped, this happens:
 
   untraced child stops with normal SIGSTOP
   parent does not wait, stopped state still to be waited for
   parent does PTRACE_ATTACH
   - child still in job stop,
  now has pending SIGSTOP
   parent does wait, sees it stopped with SIGSTOP (the first one)
   parent does PEEKUSR, GETREGS (should make no difference)
   parent does PTRACE_DETACH
 * - child has never left job stop,
  is still in job stop,
  stays in job stop after detach, does not wake up
   parent does PTRACE_ATTACH
   - child still in job stop, but has been waited for
  still pending SIGSTOP (third one came but second one still waiting)
   parent does wait, blocks since child is waited-for but still stopped
 
 What happened before my fix was that PTRACE_DETACH unconditionally woke the
 thread up from whatever state it was in.  So here,

Just to comment your *here* means the *-marked line.

 it woke up, saw the old pending SIGSTOP, and stopped again (ptrace
 stop)--now with a fresh still to be waited for stopped status.

My explanation:
This SIGSTOP you describe was generated by PTRACE_ATTACH.  As we are now after
PTRACE_DETACH (with no TracerPid) when this SIGSTOP is delivered we get into
`T (stopped)' (and not `T (tracing stop') state.


 But this wakeup on PTRACE_DETACH was exactly what detach-stopped does not
 want to see.
 
 attach-wait-on-stopped uses PTRACE_DETACH,0 while detach-stopped uses
 PTRACE_DETACH,SIGSTOP.

With the `attach-wait-on-stopped uses PTRACE_DETACH,0' testcase part I just
tried to pinpoint the utrace-ptrace difference being considered
a regression.  Upstream GDB did not support attaching-to-stopped processes
before and it still has the detach-as-stopped behavior currently undefined.
= I am not aware it would cause any real-world problems to FAIL the
second-attach case of attach-wait-on-stopped.


 So both tests can be satisfied if what it means
 is that PTRACE_DETACH always wakes up a thread (even one that has never
 left job control stop), but it should stop again for the new SIGSTOP.
 (The reason it doesn't stop again now is an esoteric internal one.)
 
 Is that what you think the rule ought to be?

Yes.  OTOH I do not find why your way would cause any real-world troubles if
you find it more systematic.


 The if in job stop, stay in job stop rule seems more sensible to me.  That
 would make detach-stopped pass and attach-wait-on-stopped fail.
 
 As you're aware, the subtle difference between staying stopped and waking
 up followed by an immediate stop is the freshening of the wait status and
 wakeup of a parent/tracer's blocked wait calls.

The goal of the GDB attach-detach behavior is to be fully transparent.
Running /usr/bin/gcore (GDB attach+gcore+detach commands) should leave the
process in a perfectly unchanged state.

We have to eat the pending SIGSTOP notification during `attach'.  With the
`PTRACE_ATTACH, tkill(SIGSTOP), PTRACE_CONT(0), waitpid()' trick (recent
upstream or mid-term RH/Fedora) GDB copes even with stopped processes with
alread pre-eaten pending SIGSTOP notification.

I find it GCORE to be more friendly by possibly generating one excessive
SIGSTOP notification than to possibly eat the only one remaining SIGSTOP
notification.  At least there are applications which run external GCORE on its
SIGSTOP-ped sub-processes which may (not confirmed) expect waitpid() to give
them SIGSTOP afterwards as it worked on before (to be specific - RHEL-4, 2.6.9
non-utrace).

I do not know about a raceless way how to find whether the SIGSTOP
notification was already pending before PTRACE_ATTACH (BTW `/proc/PID/status'
content does not change on the pending/eaten notification).  Therefore a wish
for a possibility to PTRACE_DETACH two ways (leaving/not-leaving a pending
notification) is out of question.



Thanks,
Jan



Re: x86_64-cs failed in x86

2008-07-24 Thread Jan Kratochvil
Hi,

On Thu, 24 Jul 2008 08:38:16 +0200, Wenji Huang wrote:
 [tests]$ ./x86_64-cs
 ./x86_64-cs: WIFSTOPPED - WSTOPSIG = 4
 x86_64-cs: x86_64-cs.c:160: main: Assertion `0' failed.

Thanks, committed (and also that unexpected values are a PASS now - a problem
would be just a kernel crash).


Regards,
Jan



Re: x86 single-step issues

2008-07-10 Thread Jan Kratochvil
On Wed, 09 Jul 2008 23:28:55 +0200, Roland McGrath wrote:
 I have some more fixes in the x86 bowels about ready to send upstream.
 From the status quo upstream, my changes get FAIL-PASS for
 step-jump-cont-strict (32  64), step-through-sigret (32).

Even step-jump-cont-strict, great.

 Does that cover all the issues you know about?

Yes, thanks.


Jan



Re: Issues when attaching to stopped process

2008-07-05 Thread Jan Kratochvil
On Mon, 09 Jun 2008 19:23:09 +0200, Matthew Legendre wrote:

 We're seeing issues when trying to attached to an already stopped process 
 on recent utrace kernels (seen on Fedora Core 8 and 9)--waitpid reports  
 the arrival of numerous signal 0s,

Being tracked as `stop-attach-then-wait' at:
http://sourceware.org/systemtap/wiki/utrace/tests

While you are right the ptrace-on-utrace emulation is currently incompatible
to ptrace for proper behavior during later ptrace operations you should resume
the job control stop first [attached].  It is a code from a GDB code at:
http://sourceware.org/ml/gdb-patches/2008-05/msg00022.html


Thanks,
Jan
--- test_attached_to_stopped.c  2008-06-10 23:36:54.0 +0200
+++ test_attached_to_stopped-jk.c   2008-06-11 19:19:08.0 +0200
@@ -19,6 +19,38 @@ void stop_self()
kill(getpid(), SIGSTOP);
 }
 
+/* gdb/linux-nat.c  */
+/* Detect `T (stopped)' in `/proc/PID/status'.
+   Other states including `T (tracing stop)' are reported as false.  */
+
+static int
+pid_is_stopped (pid_t pid)
+{
+  FILE *status_file;
+  char buf[100];
+  int retval = 0;
+
+  snprintf (buf, sizeof (buf), /proc/%d/status, (int) pid);
+  status_file = fopen (buf, r);
+  if (status_file != NULL)
+{
+  int have_state = 0;
+
+  while (fgets (buf, sizeof (buf), status_file))
+   {
+ if (strncmp (buf, State:, 6) == 0)
+   {
+ have_state = 1;
+ break;
+   }
+   }
+  if (have_state  strstr (buf, T (stopped)) != NULL)
+   retval = 1;
+  fclose (status_file);
+}
+  return retval;
+}
+
 int attach_then_run(void (*func)(void))
 {
int pid, result;
@@ -32,6 +64,28 @@ int attach_then_run(void (*func)(void))
 perror(Ptrace attach error);
 exit(-1);
   }
+  if (pid_is_stopped(pid)) {
+   /* gdb/linux-nat.c  */
+
+   /* The process is definitely stopped.  It is in a job control
+  stop, unless the kernel predates the TASK_STOPPED /
+  TASK_TRACED distinction, in which case it might be in a
+  ptrace stop.  Make sure it is in a ptrace stop; from there we
+  can kill it, signal it, et cetera.
+
+  First make sure there is a pending SIGSTOP.  Since we are
+  already attached, the process can not transition from stopped
+  to running without a PTRACE_CONT; so we know this signal will
+  go into the queue.  The SIGSTOP generated by PTRACE_ATTACH is
+  probably already in the queue (unless this kernel is old
+  enough to use TASK_STOPPED for ptrace stops); but since SIGSTOP
+  is not an RT signal, it can only be queued once.  */
+   kill (pid, SIGSTOP);/* tgkill() is required for threads!  */
+
+   /* Finally, resume the stopped process.  This will deliver the SIGSTOP
+  (or a higher priority signal, just like normal PTRACE_ATTACH).  */
+   ptrace (PTRACE_CONT, pid, 0, 0);
+  }
   return pid;
}
 


Re: Is PTRACE_SINGLEBLOCK buggy?

2008-06-02 Thread Jan Kratochvil
On Mon, 02 Jun 2008 11:09:56 +0200, Renzo Davoli wrote:
 Jan Kratochvil has just sent me an E-mail saying that it seems to be 
 a kvm bug (or a bug caused by kvm).

KVM bug details at https://bugzilla.redhat.com/show_bug.cgi?id=437028 .

 He is right: using qemu/kqemu instead of kvm it does not panic.
 
 Anyway I am puzzled. Using kvm the PTRACE_SINGLEBLOCK should have the
 same effect on 2.6.25.4 and 2.6.25.4+utrace.
 2.6.25.4: ptrace_resume(kernel/ptrace.c)-user_enable_block_step
 2.6.25.4+utrace: 
  ptrace_common(kernel/ptrace.c) sets UTRACE_ACTION_BLOCKSTEP 
  -utrace_quiescent(kernel/utrace.c) tests UTRACE_ACTION_BLOCKSTEP 
  -user_enable_block_step
 I wonder where is the difference...

Just FYI on 2.6.25 I still get the crash,
  host: kernel: kvm: 19661: cpu0 unhandled wrmsr: 0x1d9 data 2
kernel-2.6.25.3-18.fc9.x86_64
kvm-65-7.fc9.x86_64
  guest: vanilla 2.6.25 x86_64
 Pid: 1945, comm: block-step Not tainted 2.6.25-0.101.rc4.git3.fc8 #1
 RIP: 0010:[8100ab79]  [8100ab79] 
__switch_to+0x218/0x2bc
 (the version number is for a RPM-built vanilla kernel)
(I did not find any ptrace patches in between 2.6.25 and 2.6.25.4.)


Regards,
Jan



ptrace testsuite: reparent-zombie* race

2008-06-02 Thread Jan Kratochvil
Hi Roland,

I get randomly a race
reparent-zombie: reparent-zombie.c:88: create_zombie: Assertion `fd != 
-1' failed.
Aborted
on kernel-2.6.25.3-18.fc9.x86_64.

I hope the attached patch is right (tested only for reparent-zombie.c as
reparent-zombie-clone.c is crashing the kernel).


Best Regards,
Jan
--- tests/reparent-zombie.c 2 May 2008 01:27:20 -   1.1
+++ tests/reparent-zombie.c 2 Jun 2008 12:40:01 -
@@ -78,15 +78,19 @@ create_zombie (void)
   assert (WIFSTOPPED (status));
   assert (WSTOPSIG (status) == SIGUSR1);
 
+  /* We must open the status file first as if CHILD would finish in between
+ TRACE_CONT and this OPEN we would fail with ENOSRCH as no zombie is left
+ as we have set the SIGCHLD handler to SIG_IGN (kernel reaps the died
+ children without creating any zombies.  */
+  snprintf (buf, sizeof buf, /proc/%d/status, (int) child);
+  fd = open (buf, O_RDONLY);
+  assert (fd != -1);
+
   errno = 0;
   l = ptrace (PTRACE_CONT, child, 0l, 0l);
   assert_perror (errno);
   assert (l == 0);
 
-  snprintf (buf, sizeof buf, /proc/%d/status, (int) child);
-  fd = open (buf, O_RDONLY);
-  assert (fd != -1);
-
   do
 {
   sched_yield ();
@@ -173,6 +177,8 @@ main (void)
   signal (SIGABRT, handler_fail);
   signal (SIGALRM, handler_fail);
 
+  /* SIG_IGN as we want no zombies left - kernel reaps the died children
+ without creating any zombies.  */
   signal (SIGCHLD, SIG_IGN);
 
   fd = create_zombie ();
--- tests/reparent-zombie-clone.c   2 May 2008 01:27:20 -   1.1
+++ tests/reparent-zombie-clone.c   2 Jun 2008 12:44:15 -
@@ -123,6 +123,14 @@ create_zombie (void)
   assert (WIFSTOPPED (status));
   assert (WSTOPSIG (status) == SIGSTOP);
 
+  /* We must open the status file first as if MSG would finish in between
+ TRACE_CONT and this OPEN we would fail with ENOSRCH as no zombie is left
+ as we have set the SIGCHLD handler to SIG_IGN (kernel reaps the died
+ children without creating any zombies.  */
+  snprintf (buf, sizeof buf, /proc/%d/status, (int) msg);
+  fd = open (buf, O_RDONLY);
+  assert (fd != -1);
+
   errno = 0;
   l = ptrace (PTRACE_CONT, msg, 0l, 0l);
   assert_perror (errno);
@@ -135,10 +143,6 @@ create_zombie (void)
 
   child = msg;
 
-  snprintf (buf, sizeof buf, /proc/%d/status, (int) child);
-  fd = open (buf, O_RDONLY);
-  assert (fd != -1);
-
   do
 {
   sched_yield ();
@@ -225,6 +229,8 @@ main (void)
   signal (SIGABRT, handler_fail);
   signal (SIGALRM, handler_fail);
 
+  /* SIG_IGN as we want no zombies left - kernel reaps the died children
+ without creating any zombies.  */
   signal (SIGCHLD, SIG_IGN);
 
   fd = create_zombie ();


Re: Tests about bug step-jump-cont

2008-03-17 Thread Jan Kratochvil
Hi Wenji,

while I cannot comment on your kernel code analysis the testcase was definitely
broken since 2008-02-03 - it never PASSed.  It should be fixed now.

  /* We must set PC to our new function as the current PC stays in the glibc
 function RAISE no matter which part of the code called it - we would have
 to save and restore the whole stack for a proper restart of the code.  */

I was not sure of its correctness, sorry for the delay.


Regards,
Jan


On Thu, 13 Mar 2008 10:25:04 +0100, Wenji Huang wrote:
 Hi,

 I made tests of step-jump-cont (utrace wiki page) on i686 and x86_64 with 
 upstream 2.6.24 kernel. They have different behaviors.

 With help of assert statement and stap script, I got the following 
 understandings:

 For i686:
 1. Wait child stop upon SIGUSR1
 2. Set singlestep on child :  child-ptrace |= PT_DTRACE  
 regs-eflags |= TRAP_FLAG
 3. Change child regs-eflags |= TRAP_FLAG
 4. Continue the child and clear child-ptrace and regs-eflags due to 
 passed checking child-ptrace
 5. Wait child stop, got signal SIGUSR2
 6. Change the child regs-eflags |= TRAP_FLAG
 7. Continue the child, but couldn't clear regs-eflags due to failed 
 checking child-ptrace
 8. Wait child, but got signal SIGTRAP due to eflags (Child stop on 
 sending SIGUSR2)

 For x86_64:
 1. Wait child stop upon SIGUSR1
 2. Set singlestep on child :  child-ptrace |= PT_DTRACE  
 regs-eflags |= TRAP_FLAG.
   (*** But these are missing after the syscall ***)
 3. Change child regs-eflags |= TRAP_FLAG
 4. Continue the child, but couldn't clear regs-eflags due to failed 
 checking child-ptrace
 5. Wait child, but got signal SIGTRAP due to eflags (Child stop on 
 sending SIGUSR1).

 So I think it may be correct in i686 case, just need to change testcase. 
 But it looks like there are some problems in x86_64 code.

 Regards,
 Wenji