Re: gdbstub initial code, v11
On Wed, 22 Sep 2010 21:09:12 +0200, Tom Tromey wrote: I think it would be good to implement a feature that shows how this approach is an improvement over the current state of gdb+ptrace or gdb+gdbserver. Exactly what feature this should be... I don't know :-) I would imagine something performance-related. I would bet on a massive threads creating/deleting testcase signalling tasks around, together with watchpoints. There are races in the linux-nat code and IIRC even gdbserver code. OTOH if one tries hard one can probably manage one day to fix all the corner cases in the ptrace based linux-nat and gdbserver. Regards, Jan
Re: [PATCH] utrace: utrace_reset() should clear TIF_SINGLESTEP if no more engines
On Mon, 20 Sep 2010 21:42:19 +0200, Oleg Nesterov wrote: Test-case: Checked in http://sourceware.org/systemtap/wiki/utrace/tests as step-detach. Thanks, Jan
Re: gdbstub initial code, v9
Hi Oleg, kernel-devel-2.6.34.6-54.fc13.x86_64 (real F13) says: ugdb.c:1988: error: implicit declaration of function ‘hex_to_bin’ Jan
Re: gdbstub initial code, v9
On Thu, 09 Sep 2010 18:30:31 +0200, Oleg Nesterov wrote: OOPS! indeed, unhex() confuses lo and hi. It works for 0xcc, though. Cough... could you tell me how can I change the variable done without printing it? (gdb) help set variable Evaluate expression EXP and assign result to variable VAR, using assignment syntax appropriate for the current language (VAR = EXP or VAR := EXP for example). VAR may be a debugger convenience variable (names starting with $), a register (a few standard names starting with $), or an actual variable in the program being debugged. EXP is any valid expression. This may usually be abbreviated to simply set. Regards, Jan
Re: gdbstub initial code, v7
On Thu, 02 Sep 2010 22:06:32 +0200, Oleg Nesterov wrote: I assume that qXfer:siginfo:read always mean Hg thread. It seems so. It is not clear to me what should ugdb report if there is no a valid siginfo. linux_xfer_siginfo() return E01, but gdbserver uses SIGSTOP to stop the tracee, I find error more appropriate in such case. Likewise, it is not clear what should ugdb do if gdb sends $CSIG in this case. Currently GDB does not do anything special, that is if there is siginfo for signal SIGUSR1 but one does $C0B (SIGSEGV) does ptrace reset the siginfo or is left the SIGUSR1 siginfo for SIGSEGV? But this all is minor, I think. As this is being discussed for GDB I would find enough to just make $_siginfo accessible without these details. Thanks, Jan
Re: gdbstub initial code, v7
On Fri, 03 Sep 2010 21:59:06 +0200, Roland McGrath wrote: Currently GDB does not do anything special, that is if there is siginfo for signal SIGUSR1 but one does $C0B (SIGSEGV) does ptrace reset the siginfo or is left the SIGUSR1 siginfo for SIGSEGV? The kernel considers this sloppy behavior on the debugger's part. If you inject a different signal, we expect you should PTRACE_SETSIGINFO to something appropriate, or else that you really didn't care about the bits being accurate. If the resumption signal does not match the siginfo_t.si_signo, then the kernel resets the siginfo as if the debugger had just used kill with the new signal (i.e. si_pid, si_uid point to the ptracer). OK, that seems to me as the best choice. Sorry I did not test/read it. Thanks, Jan
Re: gdbstub initial code, v7
On Mon, 30 Aug 2010 21:20:40 +0200, Jan Kratochvil wrote: On Mon, 30 Aug 2010 20:58:50 +0200, Oleg Nesterov wrote: - report signals. A bit more code changes than I expected. BTW not sure if it is already the right time for it but to keep ugdb on-par with my linux-nat's re-post today (still not accepted in FSF GDB) That's not true, this functionality needs no gdb/remote.c changes and its correctnes relies just on ugdb (and it is probably not a problem for ugdb). ugdb should support qXfer:siginfo, currently accessible only via $_siginfo print/set, though. Still sure this feature should be also implemented one day. Thanks, Jan
Re: gdbstub initial code, v7
On Mon, 30 Aug 2010 20:58:50 +0200, Oleg Nesterov wrote: - report signals. A bit more code changes than I expected. BTW not sure if it is already the right time for it but to keep ugdb on-par with my linux-nat's re-post today (still not accepted in FSF GDB) [0/9]#2 Fix lost siginfo_t http://sourceware.org/ml/gdb-patches/2010-08/msg00480.html ugdb should support qXfer:siginfo, currently accessible only via $_siginfo print/set, though. Thanks, Jan
Re: Q: multiple inferiors, all-stop vCont
On Tue, 03 Aug 2010 18:53:59 +0200, Oleg Nesterov wrote: On 08/03, Jan Kratochvil wrote: On Tue, 03 Aug 2010 16:30:04 +0200, Oleg Nesterov wrote: However, I do not really understand how this can work reliably in the terms of remote protocol. Somehow this scheme relies on the fact that gdb will send another vCont;t:pTGID.-1 _once again_ after the previous vCont;t:pTGID.-1, and gdbserver can report the other threads via Stop/vStopped. OK, I hope this doesn't matter. attach_command_post_wait: /* At least the current thread is already stopped. */ /* In all-stop, by definition, all threads have to be already stopped at this point. In non-stop, however, although the selected thread is stopped, others may still be executing. Be sure to explicitly stop all threads of the process. This should have no effect on already stopped threads. */ if (non_stop) target_stop (pid_to_ptid (inferior-pid)); This just reflects the current situation with the current implementation. gdb already did vAttach;PID vCont;t:pPID.-1 I do not see anything in the _documentation_ which could explain that only the main thread can be stopped despite the fact -1 means all threads. -1 really means all threads - all those gdbserver knows about that time. Anyway this double-stop issue is gdbserver/libthread_db specific and offtopic for ugdb. Once again, I already understand why gdb + gdbserver work this way, I meant remote protocol in general. In remote protocol - and even internally in gdbserve - -1 really always means all the (currently known) threads. And in fact, I do not think your explanation is correct. Yes, this attach_command_post_wait() is called during attach. But even after that gdbserver reports only the main thread. This happens before qSymbol stage. This attach_command_post_wait code is executed after the qSymbol command. The first single-thread vCont: #0 putpkt (buf=0x1f348b0 vCont;t:p517.-1) at remote.c:6730 #1 in remote_stop_ns (ptid=...) at remote.c:4709 #2 in remote_stop (ptid=...) at remote.c:4747 #3 in target_stop (ptid=...) at target.c:3031 #4 in attach_command (args=0x7fffd861 1303, from_tty=1) at infcmd.c:2436 #5 in do_cfunc (c=0x1db8bf0, args=0x7fffd861 1303, from_tty=1) at ./cli/cli-decode.c:67 #6 in cmd_func (cmd=0x1db8bf0, args=0x7fffd861 1303, from_tty=1) at ./cli/cli-decode.c:1771 #7 in execute_command (p=0x7fffd864 3, from_tty=1) at top.c:422 #8 in catch_command_errors (command=0x48a3e3 execute_command, arg=0x7fffd85a attach 1303, from_tty=1, mask=6) at exceptions.c:534 #9 in captured_main (data=0x7fffd360) at ./main.c:887 The second all-threads vCont: #0 putpkt (buf=0x1f4ecb0 vCont;t:p517.-1) at remote.c:6730 #1 in remote_stop_ns (ptid=...) at remote.c:4709 #2 in remote_stop (ptid=...) at remote.c:4747 #3 in target_stop (ptid=...) at target.c:3031 #4 in attach_command_post_wait (args=0x1f3b6f0 1303, from_tty=1, async_exec=0) at infcmd.c:2334 #5 in attach_command_continuation (args=0x1f3b6a0) at infcmd.c:2355 #6 in do_my_cleanups (pmy_chain=0x7fffcd08, old_chain=0x0) at utils.c:421 #7 in do_all_inferior_continuations () at utils.c:692 #8 in inferior_event_handler (event_type=INF_EXEC_COMPLETE, client_data=0x0) at inf-loop.c:96 #9 in fetch_inferior_event (client_data=0x0) at infrun.c:2649 #10 in fetch_inferior_event_wrapper (client_data=0x0) at inf-loop.c:169 #11 in catch_errors (func=0x6b4287 fetch_inferior_event_wrapper, func_args=0x0, errstring=0xe378dd , mask=6) at exceptions.c:518 #12 in inferior_event_handler (event_type=INF_REG_EVENT, client_data=0x0) at inf-loop.c:65 #13 in remote_async_serial_handler (scb=0x1f30b00, context=0x0) at remote.c:10317 #14 in push_event (context=0x1f30b00) at ser-base.c:176 #15 in handle_timer_event (dummy=...) at event-loop.c:1306 #16 in process_event () at event-loop.c:399 #17 in gdb_do_one_event (data=0x0) at event-loop.c:452 #18 in catch_errors (func=0x6b0d2a gdb_do_one_event, func_args=0x0, errstring=0xe07943 , mask=6) at exceptions.c:518 #19 in tui_command_loop (data=0x0) at ./tui/tui-interp.c:171 #20 in current_interp_command_loop () at interps.c:291 #21 in captured_command_loop (data=0x0) at ./main.c:227 #22 in catch_errors (func=0x47ff66 captured_command_loop, func_args=0x0, errstring=0xdc6967 , mask=6) at exceptions.c:518 #23 in captured_main (data=0x7fffd360) at ./main.c:910 But, it is very possible I missed something. Ang again, I think (I hope ;) we can forget this because the simple method works too. This discussion is really offtopic for ugdb. I was afraid there are some other reason why we can't avoid libthread_db. Roland has correctly pointed out the TLS support. But that will come later. Yes, I do understand vAttach issues, but I thought that attach command should always hide these details. From the documentation: attach PROCESS-ID
Re: gdbstub initial code, another approach
On Wed, 28 Jul 2010 20:17:02 +0200, Oleg Nesterov wrote: - the testing was very limited. I played with it about an hour and didn't find any problems, vut that is all. [...] Btw, gdb crashes very often right after (gdb) set target-async on (gdb) set non-stop (gdb) file mt-program (gdb) target extended-remote :port (gdb) attach its_pid I didn't even try to investigate (this doesn't happen when it works with the real gdbserver). Just retry, gdb is buggy. Trying it with both /bin/sleep and a threaded testcase and I never got a crash (kernel-2.6.33.6-147.fc13.x86_64 as both host and KVM guest OS). $ killall gdbstub;~/redhat/threaditp=$!;~/redhat/gdbstub ~/redhat/outsleep 0.1;./gdb -nx -ex 'set target-async on' -ex 'set non-stop' -ex file $HOME/redhat/threadit -ex 'target extended-remote :2000' -ex attach $p -ex 'set confirm no';kill $p; gdbstub: no process killed [6] 22822 [7] 22823 GNU gdb (GDB) 7.2.50.20100802-cvs Copyright (C) 2010 Free Software Foundation, Inc. License GPLv3+: GNU GPL version 3 or later http://gnu.org/licenses/gpl.html This is free software: you are free to change and redistribute it. There is NO WARRANTY, to the extent permitted by law. Type show copying and show warranty for details. This GDB was configured as x86_64-unknown-linux-gnu. For bug reporting instructions, please see: http://www.gnu.org/software/gdb/bugs/. Reading symbols from /home/jkratoch/redhat/threadit...done. Remote debugging using :2000 Attached to process 22822 [New Thread 22822.22822] [New Thread 22822.22825] Reading symbols from /lib64/libpthread.so.0...Reading symbols from /usr/lib/debug/lib64/libpthread-2.12.so.debug...done. done. Loaded symbols for /lib64/libpthread.so.0 Reading symbols from /lib64/libc.so.6...Reading symbols from /usr/lib/debug/lib64/libc-2.12.so.debug...done. done. Loaded symbols for /lib64/libc.so.6 Reading symbols from /lib64/ld-linux-x86-64.so.2...Reading symbols from /usr/lib/debug/lib64/ld-2.12.so.debug...done. done. Loaded symbols for /lib64/ld-linux-x86-64.so.2 0x7fead8db6fbd in pthread_join (threadid=140646633633552, thread_return=0x0) at pthread_join.c:89 89 lll_wait_tid (pd-tid); (gdb) [Thread 22822.22825] #2 stopped. 0x7fead8ad6a6d in nanosleep () at ../sysdeps/unix/syscall-template.S:82 82 T_PSEUDO (SYSCALL_SYMBOL, SYSCALL_NAME, SYSCALL_NARGS) Current language: auto The current source language is auto; currently asm. info threads 2 Thread 22822.22825 0x7fead8ad6a6d in nanosleep () at ../sysdeps/unix/syscall-template.S:82 * 1 Thread 22822.22822 0x7fead8db6fbd in pthread_join (threadid=140646633633552, thread_return=0x0) at pthread_join.c:89 (gdb) q [7]+ Done~/redhat/gdbstub ~/redhat/out [6]+ Terminated ~/redhat/threadit Thanks, Jan
Re: gdbstub initial code, another approach
On Fri, 30 Jul 2010 16:41:24 +0200, Oleg Nesterov wrote: IOW, you think that it is better to shift gdbserver into kernel-space than port the existing one to the new API or write the new one in user space ? So far I just assumed kernel-space ugdb is the plan. As I wrote before I do not know gdbserver too much. If you check gdb/gdbserver/linux-low.c it is just one big ptrace/wait/\/proc interface. I would guess it could be more simple with the utrace API at hand. Catching up with systemtap's 200x higher software-watchpoint performance over current (local) gdb (described in [debug-list] Utrace Discussion Notes off this list) could be easier with in-kernel gdb I thought. Thanks, Jan
Re: clone bug (glibc?) (Was: clone-multi-ptrace test failure)
On Tue, 01 Dec 2009 20:39:40 +0100, Roland McGrath wrote: I think the best bet is to link with -Wl,-z,now and then minimize the library code you rely on. Checked-in the fix of at least Fedora 12 x86_64 below. getppid() does not look to be needed there - PTRACE_SYSCALL does stop (WIFSTOPPED) on the entry (before WIFEXITED) to __NR_exit keeping the PASS/FAIL reproducibility. Regards, Jan --- Makefile.am 29 Nov 2009 02:23:25 - 1.60 +++ Makefile.am 14 Dec 2009 09:47:54 - 1.61 @@ -111,6 +111,8 @@ stopped_attach_transparency_LDFLAGS = -l erestartsys_trap_LDFLAGS = -lutil erestartsys_trap_debugger_LDFLAGS = -lutil erestartsys_trap_32fails_debugger_LDFLAGS = -lutil +# After clone syscall it must call no glibc code (such as _dl_runtime_resolve). +clone_multi_ptrace_LDFLAGS = -Wl,-z,now check_TESTS = $(SAFE) xcheck_TESTS = $(CRASHERS) --- clone-multi-ptrace.c5 Dec 2008 14:41:57 - 1.6 +++ clone-multi-ptrace.c14 Dec 2009 09:47:54 - 1.7 @@ -65,10 +65,10 @@ static char grandchild_seen[THREAD_NUM]; static int grandchild_func (void *unused) { - /* Need to have at least one syscall before exit */ - getppid (); - /* _exit() would make ALL threads to exit. We need rew syscall */ + /* _exit() would make ALL threads to exit. We need rew syscall. After the + clone syscall it must call no glibc code (such as _dl_runtime_resolve). */ syscall (__NR_exit, 22); + return 0; }
Re: [PATCH v2] ptrace-tests: fix step-fork.c on powerpc for ptrace-utrace
On Tue, 01 Dec 2009 18:38:27 +0100, Veaceslav Falico wrote: Instead of using fork(), call syscall(__NR_fork) in step-fork.c to avoid looping on powerpc arch in libc. Checked-in. (Not seen any problems with syscall and using glibc afterwards as in the clone-multi-ptrace.c case so left it as is.) Regards, Jan Signed-off-by: Veaceslav Falico vfal...@redhat.com --- --- a/ptrace-tests/tests/step-fork.c 2009-12-01 17:17:14.0 +0100 +++ b/ptrace-tests/tests/step-fork.c 2009-12-01 18:35:15.0 +0100 @@ -29,6 +29,7 @@ #include unistd.h #include sys/wait.h #include string.h +#include sys/syscall.h #include signal.h #ifndef PTRACE_SINGLESTEP @@ -78,7 +79,12 @@ main (int argc, char **argv) sigprocmask (SIG_BLOCK, mask, NULL); ptrace (PTRACE_TRACEME); raise (SIGUSR1); - if (fork () == 0) + + /* + * Can't use fork() directly because on powerpc it loops inside libc under + * PTRACE_SINGLESTEP. See http://marc.info/?l=linux-kernelm=125927241130695 + */ + if (syscall(__NR_fork) == 0) { read (-1, NULL, 0); _exit (22);
Re: powerpc: step-jump-cont failure (Was: [PATCH] utrace: don't set -ops = utrace_detached_ops lockless)
On Wed, 09 Dec 2009 19:12:41 +0100, Oleg Nesterov wrote: while the '.func_name' is the text address. tried to change the code to REGS_ACCESS (regs, nip) = (unsigned long) .raise_sigusr2 but gcc doesn't like this ;) ... Yes, I verified the patch below fixes step-jump-cont.c on ibm-js20-02.lab.bos.redhat.com. Checked-in a similar patch but same as used now in other testcases, sorry for not using the patch of yours. Regards, Jan --- step-jump-cont.c8 Dec 2008 18:23:41 - 1.12 +++ step-jump-cont.c14 Dec 2009 11:38:37 - 1.13 @@ -213,6 +213,24 @@ int main (void) REGS_ACCESS (regs, eip) = (unsigned long) raise_sigusr2; #elif defined __x86_64__ REGS_ACCESS (regs, rip) = (unsigned long) raise_sigusr2; +#elif defined __powerpc64__ + { +/* ppc64 `raise_sigusr2' resolves to the function descriptor. */ +union + { + void (*f) (void); + struct + { + void *entry; + void *toc; + } + *p; + } +const func_u = { raise_sigusr2 }; + +REGS_ACCESS (regs, nip) = (unsigned long) func_u.p-entry; +REGS_ACCESS (regs, gpr[2]) = (unsigned long) func_u.p-toc; + } #elif defined __powerpc__ REGS_ACCESS (regs, nip) = (unsigned long) raise_sigusr2; #else
Re: Tests Failures on PPC64
On Wed, 09 Dec 2009 19:31:52 +0100, Oleg Nesterov wrote: Hmm. it is obvioulsy racy, static volatile unsigned started is not atomic and thus the main thread can hang doing while (started THREADS); not that I think this explains the failure though. Thanks, fixed (but the problem is not reproducible for me). Regards, Jan --- ppc-dabr-race.c 8 Dec 2008 18:23:41 - 1.8 +++ ppc-dabr-race.c 14 Dec 2009 12:03:49 - 1.9 @@ -141,13 +141,14 @@ handler_fail (int signo) assert (0); } +/* STARTED requires atomic access. */ static volatile unsigned started; static void *child_thread (void *data) { pid_t tid = gettid (); - started++; + __sync_add_and_fetch (started, 1); /* We should stay in the syscall - better race probability. */ sleep (1); @@ -178,7 +179,7 @@ static void child_func (void) assert (i == 0); } - while (started THREADS); + while (__sync_add_and_fetch (started, 0) THREADS); l = ptrace (PTRACE_TRACEME, 0, NULL, NULL); assert (l == 0);
Re: step-into-handler.c compilation failure on ppc64
On Sat, 05 Dec 2009 18:19:20 +0100, Roland McGrath wrote: How about this? --- step-into-handler.c 10 Dec 2008 04:42:43 -0800 1.8 +++ step-into-handler.c 05 Dec 2009 09:18:54 -0800 [...] @@ -113,11 +114,11 @@ handler_alrm_get (void) { #if defined __powerpc64__ /* ppc64 `handler_alrm' resolves to the function descriptor. */ - return *(void **) handler_alrm; + return *(void **) (uintptr_t) handler_alrm; /* __s390x__ defines both the symbols. */ #elif defined __s390__ !defined __s390x__ /* s390 bit 31 is zero here but I am not sure if it cannot be arbitrary. */ [...] On Sat, 05 Dec 2009 18:39:05 +0100, CAI Qian wrote: Thanks. Fixed. I have to say it did not help for me (gcc-4.4.2-7.el6.ppc64). error: dereferencing type-punned pointer will break strict-aliasing rules Checked-in the union-based fix below (both tests PASS on ppc64). Regards, Jan --- erestartsys.c 27 Nov 2009 22:50:31 - 1.13 +++ erestartsys.c 14 Dec 2009 00:38:42 - 1.14 @@ -38,6 +38,7 @@ #include stddef.h #include pty.h #include string.h +#include stdint.h #if defined __x86_64__ # define REGISTER_IP .rip @@ -298,8 +299,23 @@ main (int argc, char **argv) user = user_orig; user REGISTER_IP = (unsigned long) func; #ifdef __powerpc64__ - user.nip = ((const unsigned long *) func)[0]; /* entry */ - user.gpr[2] = ((const unsigned long *) func)[1]; /* TOC */ + { +/* ppc64 `func' resolves to the function descriptor. */ +union + { + void (*f) (void); + struct + { + void *entry; + void *toc; + } + *p; + } +const func_u = { func }; + +user.nip = (uintptr_t) func_u.p-entry; +user.gpr[2] = (uintptr_t) func_u.p-toc; + } #endif /* GDB amd64_linux_write_pc(): */ /* We must be careful with modifying the program counter. If we --- step-into-handler.c 8 Dec 2008 18:23:41 - 1.8 +++ step-into-handler.c 14 Dec 2009 00:38:42 - 1.9 @@ -113,7 +113,19 @@ handler_alrm_get (void) { #if defined __powerpc64__ /* ppc64 `handler_alrm' resolves to the function descriptor. */ - return *(void **) handler_alrm; + union +{ + void (*f) (int signo); + struct + { + void *entry; + void *toc; + } + *p; +} + const func_u = { handler_alrm }; + + return func_u.p-entry; /* __s390x__ defines both the symbols. */ #elif defined __s390__ !defined __s390x__ /* s390 bit 31 is zero here but I am not sure if it cannot be arbitrary. */
Re: powerpc: step-jump-cont failure (Was: [PATCH] utrace: don't set -ops = utrace_detached_ops lockless)
On Mon, 07 Dec 2009 15:24:51 +0100, Oleg Nesterov wrote: But. raise_sigusr2 is not equal to the actual address of raise_sigusr2(), this value points to the thunk (I do not know the correct English term) ppc64 calls it function descriptor (GDB ppc64_linux_convert_from_func_ptr_addr): For PPC64, a function descriptor is a TOC entry, in a data section, which contains three words: the first word is the address of the function, the second word is the TOC pointer (r2), and the third word is the static chain value. (gdb) x/8gx 0x805b6f6258 0x805b6f6258 open:0x00805b65cf68 0x00805b702ac0 0x805b6f6268 open64: 0x00805b65d010 0x00805b702ac0 (gdb) x/20i 0x00805b65cf68 0x805b65cf68 .__GI___open:lwz r10,-30432(r13) 0x805b65cf6c .__GI___open+4: cmpwi r10,0 0x805b65cf70 .__GI___open+8: bne-0x805b65cf84 .__GI___open+28 (gdb) info sym 0x00805b702ac0 last_nip in section .bss I was not aware there is any third word before and I do not see it there. Regards, Jan
Re: utrace-ptrace gdb testsuite tesults
On Wed, 25 Nov 2009 23:30:37 +0100, Jan Kratochvil wrote: Please point at some built or easily buildable kernel .rpm first. http://kojipkgs.fedoraproject.org/scratch/roland/task_1825649/ OK, taken for reverification. Followed the differences found by Qian and verified none of them (did not verify the ppc suspicious one) has any regression in GDB testsuite. Regards, Jan
Re: utrace-ptrace gdb testsuite tesults
On Sun, 29 Nov 2009 23:39:59 +0100, Jan Kratochvil wrote: Followed the differences found by Qian and verified none of them (did not verify the ppc suspicious one) has any regression in GDB testsuite. Forgot the log FYI. Regards, Jan -result-2.6.31.5-127.fc12.x86_64/gdb +result-2.6.32-0.53.rc8.496.fc13.x86_64/gdb *attach*stop* generally unchecked, it should be covered by ptrace-testsuite and the GDB testcases are currently racy. -FAIL: gdb.base/follow-child.exp: break +PASS: gdb.base/follow-child.exp: break = unstable testcase, RH-specific, dropped as redundant to other testcases /root/jkratoch/redhat/gdb-7.0-3.fc12.src/gdb-7.0-m64/gdb/testsuite -PASS: gdb.threads/attachstop-mt.exp: attach4 stop by interrupt -PASS: gdb.threads/attachstop-mt.exp: attach4, exit leaves process sleeping +FAIL: gdb.threads/attachstop-mt.exp: attach4 stop by interrupt (timeout) +FAIL: gdb.threads/attachstop-mt.exp: attach4, exit leaves process sleeping = racy, ignored -PASS: gdb.base/foll-fork.exp: default parent follow, no catchpoints +FAIL: gdb.base/foll-fork.exp: (timeout) default parent follow, no catchpoints = racy, fixed the testcase upstream /root/jkratoch/redhat/gdb-7.0-3.fc12.src/gdb-7.0-m64/gdb/testsuite.unix.-m32 -FAIL: gdb.threads/attach-stopped.exp: threaded: attach1, exit leaves process stopped +PASS: gdb.threads/attach-stopped.exp: threaded: attach1, exit leaves process stopped = racy, ignored -FAIL: gdb.base/interrupt.exp: continue +PASS: gdb.base/interrupt.exp: continue FAIL: gdb.base/interrupt.exp: echo data (timeout) ERROR: Undefined command . UNRESOLVED: gdb.base/interrupt.exp: Send Control-C, second time FAIL: gdb.base/interrupt.exp: signal SIGINT (the program is no longer running) -FAIL: gdb.base/interrupt.exp: echo more data (timeout) -FAIL: gdb.base/interrupt.exp: send end of file +PASS: gdb.base/interrupt.exp: echo more data +FAIL: gdb.base/interrupt.exp: send end of file (eof) = both kernels behave the same - correctly, updated erestart* tests set, for x86_64-x86_64-i386 (kernel-debugger-inferior) GDB needs a fix: http://sourceware.org/ml/gdb-patches/2009-11/msg00592.html -FAIL: gdb.server/ext-run.exp: get process list +PASS: gdb.server/ext-run.exp: get process list = upstream gdbserver data corruption -FAIL: gdb.java/jnpe.exp: next over NPE +PASS: gdb.java/jnpe.exp: next over NPE = fixed the testcase in archer /root/jkratoch/redhat/gdb-7.0-3.fc12.src/gdb-7.0-m32/gdb/testsuite.unix.-m32 -ERROR: Couldn't send info inferior 16 to GDB. -UNRESOLVED: gdb.base/multi-forks.exp: Did kill 16 +PASS: gdb.base/multi-forks.exp: Run to exit 11 = always ignored by me, IMO racy -PASS: gdb.threads/attachstop-mt.exp: attach4 stop by interrupt -PASS: gdb.threads/attachstop-mt.exp: attach4, exit leaves process sleeping +FAIL: gdb.threads/attachstop-mt.exp: attach4 stop by interrupt (timeout) +FAIL: gdb.threads/attachstop-mt.exp: attach4, exit leaves process sleeping = racy, ignored -FAIL: gdb.cp/constructortest.exp: running to main in runto PASS: gdb.cp/constructortest.exp: breaking on A::A -FAIL: gdb.cp/constructortest.exp: continue to breakpoint: First line A -FAIL: gdb.cp/constructortest.exp: Verify in in-charge A::A -FAIL: gdb.cp/constructortest.exp: continue to breakpoint: First line A -FAIL: gdb.cp/constructortest.exp: Verify in not-in-charge A::A +PASS: gdb.cp/constructortest.exp: continue to breakpoint: First line A +PASS: gdb.cp/constructortest.exp: Verify in in-charge A::A +PASS: gdb.cp/constructortest.exp: continue to breakpoint: First line A +PASS: gdb.cp/constructortest.exp: Verify in not-in-charge A::A -FAIL: gdb.pie/break.exp: run until function breakpoint -FAIL: gdb.pie/break.exp: run until breakpoint set at a line number -FAIL: gdb.pie/break.exp: run until file:function(6) breakpoint -FAIL: gdb.pie/break.exp: run until file:function(5) breakpoint (the program is no longer running) -FAIL: gdb.pie/break.exp: run until file:function(4) breakpoint (the program is no longer running) -FAIL: gdb.pie/break.exp: run until file:function(3) breakpoint (the program is no longer running) -FAIL: gdb.pie/break.exp: run until file:function(2) breakpoint (the program is no longer running) -FAIL: gdb.pie/break.exp: run until file:function(1) breakpoint (the program is no longer running) -FAIL: gdb.pie/break.exp: run until quoted breakpoint (the program is no longer running) -FAIL: gdb.pie/break.exp: run until file:linenum breakpoint (the program is no longer running) -FAIL: gdb.pie/break.exp: breakpoint offset +1 -FAIL: gdb.pie/break.exp: step onto breakpoint (the program is no longer running) +PASS: gdb.pie/break.exp: run until function breakpoint +PASS: gdb.pie/break.exp: run until breakpoint set at a line number +PASS: gdb.pie/break.exp: run until file:function(6) breakpoint +PASS: gdb.pie/break.exp: run until file:function(5) breakpoint +PASS: gdb.pie/break.exp: run until file:function(4) breakpoint +PASS: gdb.pie
Re: utrace-ptrace gdb testsuite tesults
On Fri, 27 Nov 2009 15:11:09 +0100, Veaceslav Falico wrote: -FAIL: gdb.base/foll-fork.exp: unpatch child, unpatched parent breakpoints from child (timeout) +PASS: gdb.base/foll-fork.exp: unpatch child, unpatched parent breakpoints from child -PASS: gdb.base/foll-fork.exp: set follow parent, hit tbreak +FAIL: gdb.base/foll-fork.exp: (timeout) set follow parent, hit tbreak To be ignored, fixed upstream: http://sourceware.org/ml/gdb-patches/2009-11/msg00573.html -PASS: gdb.mi/mi-nsmoribund.exp: resume all, program exited normally +FAIL: gdb.mi/mi-nsmoribund.exp: unexpected stop -KFAIL: gdb.threads/watchthreads2.exp: gdb can drop watchpoints in multithreaded app (PRMS: gdb/10116) +PASS: gdb.threads/watchthreads2.exp: all threads incremented x These are known to be unstable but there some known watch and non-stop problems so it may not even be a testcase-side bug. Therefore this test shows no changes/regressions. Regards, Jan
Re: utrace-ptrace gdb testsuite tesults
On Fri, 27 Nov 2009 15:34:05 +0100, Oleg Nesterov wrote: Jan, if you see something particular which needs more attention or should be fixed, please let me know. I'll try to investigate then. I am still not finished with the verifications yesterday but so far no kernel behavior change has been proven and I doubt it will be. Going to reply today. The ppc kernel should be checked but I do not have built two non-utrace/utrace matching kernel rpms for it. Regards, Jan
Re: GDB Testsuite Results with CONFIG_UTRACE i686
Hi, the gdb.pie/break.exp change would be worth checking more but this is based on the old PIE patch with various known problems and for RHEL-6 there will be a different/new PIE patch implementation. Also the gdb.base/bigcore.exp and gdb.base/follow-child.exp changes would be worth checking if the change is stable across multiple runs of the specific testcase. You can also check gdb.log differences, sometimes it is apparent the change is OK. Otherwise if the change is stable across multiple runs and it is not obvious to you why it did change as you already have the machine ready could you please provide the hostname/password/etc. there? Thanks, Jan
Re: GDB Testsuite Results on POWERPC
On Wed, 25 Nov 2009 09:59:11 +0100, Ananth N Mavinakayanahalli wrote: Essentially, there is *no* change in any of the numbers with and without ptrace over utrace. While it is probable so please rather check diff of the *.sum files as some of the results are fuzzy and - in a rare possibility - two results changing FAIL-PASS and PASS-FAIL will not show in this summary. Thanks, Jan
Re: utrace-ptrace gdb testsuite tesults
On Wed, 25 Nov 2009 22:17:15 +0100, Roland McGrath wrote: In general everything where is a word thread has unstable results and nonstop tests are also a bit unstable. So where exactly is the problem in these cases? Are the tests overly timing-sensitive where there is no actual behavior bug? Or is gdb overly timing-sensitive where there is no actual kernel bug? Or is it just unknown, and might be a kernel bug after all (even an undiagnosed one in vanilla kernels)? gdb.server/server-run.exp: gdbserver contains data overflow/corruption, occasionally it crashes, occasionally passes. gdb.mi/mi-nonstop-exit.exp: Some race in GDB non-stop code. gdb.threads/attach-stopped.exp: Race in the testcase (I think so). etc. But in most cases I do not know, gdb.log is commonly not enough to find the problem and when it is not reproducible on the 2nd..nth run... But I+upstream already caught many races but still a lot of them remains. There are IMO/hopefully very few cases tested by the gdb testsuite and still not covered by the ptrace-testsuite, I even do not much expect we will see again a new utrace regression caught by the gdb testsuite uncaught by the ptrace-testsuite. That's certainly good to hear. If you are pretty confident about that, then I am quite happy to consider nonregression on all of ptrace-tests the sole gating test for kernel changes. We just don't want to wind up having other upstream reviewers notice a regression using gdb that we didn't notice before we submitted a kernel change. I did not verify the GDB codebase for all the ptrace calls in any way. If it is a kernel patch submit after long development period it is probably still worth checking it against GDB. Please point at some built or easily buildable kernel .rpm first. http://kojipkgs.fedoraproject.org/scratch/roland/task_1825649/ OK, taken for reverification. Regards, Jan
Re: [PATCH 1-13] utrace-ptrace V1, for internal review
On Tue, 24 Nov 2009 12:31:41 +0100, Srikar Dronamraju wrote: When I get the latest set of ptrace-tests by using. cvs -d :pserver:anoncvs:anon...@sources.redhat.com:/cvs/systemtap co ptrace-tests 1. Am I using the right source of ptrace-tests or has its location changed. It is right, webpage at: http://sourceware.org/systemtap/wiki/utrace/tests 2. Are these new testcases x86 architecture specific? step-from-clone + syscall-from-clone Fixed/checked-in, now they SKIP (rc 77) on unsupported arches. New support for ppc/ppc64 PASSes. New support for s390/s390x FAILs (kernel-2.6.18-164.6.1.el5.s390x). orig_gpr2 seems to be errorneously set to the retval (gprs[2]). 3. Shouldn't arch/powerpc/include/asm/user.h not define user_regs_struct? Not sure why but ppc uses `struct pt_regs'. Thanks, Jan
Re: utrace-ptrace detach with signal semantics
On Sat, 10 Oct 2009 18:24:21 +0200, Oleg Nesterov wrote: On 10/06, Jan Kratochvil wrote: Yes, I agree with the current general behavior of ptrace there is missing: if (WIFSTOPPED (status) WSTOPSIG (status) == SIGPIPE) [...snip...] --- attach-into-signal.c31 Jan 2009 21:11:40 - 1.5 +++ attach-into-signal.c6 Oct 2009 14:27:08 - @@ -224,6 +224,18 @@ static void reproduce (void) child = 0; return; } + /* SIGPIPE was still pending and it has not been yet delivered. */ + if (WIFSTOPPED (status) WSTOPSIG (status) == SIGPIPE) +{ + /* Deliver it and get the queued SIGSTOP. */ + errno = 0; + ptrace (PTRACE_CONT, child, (void *) 1, (void *) SIGPIPE); + assert_perror (errno); + + errno = 0; + pid = waitpid (child, status, 0); + assert (pid == child); +} assert (WIFSTOPPED (status)); assert (WSTOPSIG (status) == SIGSTOP); /* let tracee run. it must be killed very soon by SIGPIPE */ Jan, please revert this change. Reverted (VERBOSE-caught with FAIL now). Jan
Re: utrace-ptrace detach with signal semantics
On Sat, 10 Oct 2009 18:48:29 +0200, Oleg Nesterov wrote: On 10/10, Jan Kratochvil wrote: (VERBOSE-caught with FAIL now). Cough. please translate this to me ;) (Cc of each such mail to Roland does not look OK to me but removing Ccs is also not OK) --- tests/attach-into-signal.c 6 Oct 2009 19:21:35 - 1.6 +++ tests/attach-into-signal.c 10 Oct 2009 16:47:46 - 1.7 @@ -227,14 +227,8 @@ static void reproduce (void) /* SIGPIPE was still pending and it has not been yet delivered. */ if (WIFSTOPPED (status) WSTOPSIG (status) == SIGPIPE) { - /* Deliver it and get the queued SIGSTOP. */ - errno = 0; - ptrace (PTRACE_CONT, child, (void *) 1, (void *) SIGPIPE); - assert_perror (errno); - - errno = 0; - pid = waitpid (child, status, 0); - assert (pid == child); + VERBOSE (Forbidden to catch pending signal from PTRACE_DETACH); + exit (1); } assert (WIFSTOPPED (status)); assert (WSTOPSIG (status) == SIGSTOP);
Re: [PATCH 83] ptrace(DETACH, SIGKILL) should really kill the tracee
On Sat, 10 Oct 2009 18:17:12 +0200, Oleg Nesterov wrote: Roland, Jan, what user-space expects ptrace(DETACH, SIGKILL) should do? My guess: this should really kill the tracee asap, hence this patch. attached testcase works for me on both: kernel-2.6.31.1-48.fc12.x86_64 kernel-2.6.30.5-43.fc11.x86_64 does it FAIL for you to make it worth to the testsuite? Thanks, Jan /* Test case for (PTRACE_DETACH, SIGKILL) really does kill the tracee. This software is provided 'as-is', without any express or implied warranty. In no event will the authors be held liable for any damages arising from the use of this software. Permission is granted to anyone to use this software for any purpose, including commercial applications, and to alter it and redistribute it freely. */ #define _GNU_SOURCE 1 #include assert.h #include unistd.h #include sys/wait.h #include sys/ptrace.h #include stdio.h #include stdlib.h #include stddef.h #include errno.h static pid_t child; static void cleanup (void) { if (child 0) kill (child, SIGKILL); child = 0; } static void handler_fail (int signo) { cleanup (); signal (signo, SIG_DFL); raise (signo); } int main (void) { pid_t got_pid; int status; long l; atexit (cleanup); signal (SIGABRT, handler_fail); signal (SIGINT, handler_fail); child = fork (); switch (child) { case -1: assert_perror (errno); case 0: l = ptrace (PTRACE_TRACEME, 0, NULL, NULL); assert (l == 0); raise (SIGUSR1); _exit (42); default: break; } got_pid = waitpid (child, status, 0); assert (got_pid == child); assert (WIFSTOPPED (status)); assert (WSTOPSIG (status) == SIGUSR1); errno = 0; l = ptrace (PTRACE_DETACH, child, NULL, (void *) (long) SIGKILL); assert_perror (errno); assert (l == 0); got_pid = waitpid (child, status, 0); assert (got_pid == child); assert (WIFSIGNALED (status)); assert (WTERMSIG (status) == SIGKILL); return 0; }
Re: utrace-ptrace detach with signal semantics
On Wed, 07 Oct 2009 15:33:49 +0200, Oleg Nesterov wrote: On 10/06, Jan Kratochvil wrote: It should work also for PTRACE_SINGLESTEP. Heh. Yes, but with one exception. - the tracee has a hanlder for, say, SIGHUP - the tracee deques SIGHUP, reports to the tracer, and stops. - the tracer does ptrace(SINGLESTEP, SIGHUP) // it could use another signr, this works. // but the tracer must have a handler or // everething is OK. - the tracee delivers SIGHUP to itself, handle_signal() notices TIF_SINGLESTEP and calls ptrace_notify(). Now, the tracee reports SIGTRAP, but the next time the tracer does ptrace(WHATEVER, SIGNR) SIGNR will be ignored. OK, this is really a border case I did not mean. In such case the SIGHUP handler still is not fully execting and as the non-realtime signals do not nest (count) it is OK it gets activated only once. I did mean some more normal case of: ptrace (PTRACE_SIGNALSTEP, 0) = 0 waitpid() = SIGTRAP ptrace (PTRACE_SIGNALSTEP, 0) = 0 waitpid() = SIGTRAP PTRACE (PTRACE_DETACH, SIGSTOP) = 0 which I assume it will work. Thanks, Jan
Re: utrace-ptrace detach with signal semantics
On Tue, 06 Oct 2009 15:10:10 +0200, Oleg Nesterov wrote: On 10/06, Jan Kratochvil wrote: On Mon, 05 Oct 2009 21:00:37 +0200, Oleg Nesterov wrote: On 10/05, Jan Kratochvil wrote: Naive programs expect the first signal after PTRACE_ATTACH will be SIGSTOP. They should not, this is just wrong. That may be a right point but such programs are in use out there. Sure if it would be a real difficulty one can keep it as-is as GDB-7.0 soon to be released has it already fixed, strace works with it. Still ltrace crashes the inferior in such case. Confused. Do you mean we should fix the kernel to match this expectation? Yes, I was thinking it would be a good idea. This was never true. I agree. You attached the test-case which sends SIGALRM to itself in a loop. If the tracer attaches to this program, it is very possible that SIGALRM will be reported, not SIGSTOP. Yes. And the real world tracers do not expect so. This test-case also does: /* detach with SIGPIPE/attach. This should kill tracee */ ptrace (PTRACE_DETACH, child, (void *) 1, (void *) SIGPIPE); ptrace (PTRACE_ATTACH, child, (void *) 0, (void *) 0); waitpid (child, status, 0); assert (WIFSIGNALED (status) WTERMSIG (status) == SIGPIPE); It fails if the second PTRACE_ATTACH sees SIGPIPE. This is what I can't understand. Second keyword PTRACE_ATTACH is on line 167. First keyword SIGPIPE is on line 199. Line 167 cannot see anything from line 199. Assuming you did mean third PTRACE_ATTACH. Line 222: if (WIFSIGNALED (status) WTERMSIG (status) == SIGPIPE) Line 227: assert (WIFSTOPPED (status)); Line 228: assert (WSTOPSIG (status) == SIGSTOP); Yes, I agree with the current general behavior of ptrace there is missing: if (WIFSTOPPED (status) WSTOPSIG (status) == SIGPIPE) It fails on ptrace-over-utrace. Attached, I do not have kernel.rpm with ptrace-over-utrace ready, please verify it and check it in (or ping me or so). Once again. Suppose that the tracer does ptrace(PTRACE_DETACH, SIGXXX). Currently, if the next thacer attaches right after this detach it has no way to intercept SIGXXX, it will be never reported via ptrace_signal(). No matter if it gets reported to the new tracer still SIGXXX should never get lost. If it is not reported to the new tracer then it will be always processed by the tracee, is it right? Is this really important? Do you know any application which can be broken if we change this behaviour? With the current utrace-ptrace implementation SIGXXX can be reported to the new tracer. I think there is no application which would handle non-SIGSTOP as the first signal after PTRACE_ATTACH while it would get confused by getting non-SIGSTOP signal as the first one after PTRACE_ATTACH after PTRACE_DETACH. OK, this relates to the first signal should be SIGSTOP but this is wrong anyway, and the case above is very unlikely. If the first signal should be SIGSTOP is not satisfied (current state) I think you can freely change this behavior whether SIGXXX will be reported to the new tracer and we should apply the attached ptrace-testsuite patch. If the first signal should be SIGSTOP gets fixed/satisfied (proposed state) I think it is clear SIGXXX from PTRACE_DETACH must not be lost and it must not be visible as the first signal after PTRACE_ATTACH. In such case the ptrace-testsuite testcase attach-into-signal should be simplified a lot to always just require SIGSTOP as the first signal after PTRACE_ATTACH and the attached change gets irrelevant in such case. Thanks, Jan --- attach-into-signal.c31 Jan 2009 21:11:40 - 1.5 +++ attach-into-signal.c6 Oct 2009 14:27:08 - @@ -224,6 +224,18 @@ static void reproduce (void) child = 0; return; } + /* SIGPIPE was still pending and it has not been yet delivered. */ + if (WIFSTOPPED (status) WSTOPSIG (status) == SIGPIPE) +{ + /* Deliver it and get the queued SIGSTOP. */ + errno = 0; + ptrace (PTRACE_CONT, child, (void *) 1, (void *) SIGPIPE); + assert_perror (errno); + + errno = 0; + pid = waitpid (child, status, 0); + assert (pid == child); +} assert (WIFSTOPPED (status)); assert (WSTOPSIG (status) == SIGSTOP); /* let tracee run. it must be killed very soon by SIGPIPE */
Re: utrace-ptrace detach with signal semantics
On Tue, 06 Oct 2009 19:14:28 +0200, Oleg Nesterov wrote: On 10/06, Jan Kratochvil wrote: [...] If it is not reported to the new tracer then it will be always processed by the tracee, is it right? Yes, sure. (But, just in case... if the tracer does ptrace(DETACH, SIGNR), this signr only matters if the tracee was stopped after reporting syscall or signal, otherwise SIGNR is ignored). In which specific cases SIGNR can get ignored? Whole PTRACE_DETACH will be ignored if the tracee is not stopped. It SIGNR will be proabably ignored if the tracee is now dead. Otherwise SIGNR should get delivered, shouldn't it? + /* SIGPIPE was still pending and it has not been yet delivered. */ + if (WIFSTOPPED (status) WSTOPSIG (status) == SIGPIPE) [...] Yes, I didn't verify this yet, but I think with this patch the test-case should succeed with utrace-ptrace kernel. Checked-in. Thanks, Jan
Re: utrace-ptrace detach with signal semantics
On Tue, 06 Oct 2009 22:05:16 +0200, Oleg Nesterov wrote: For example, the tracee reports PTRACE_EVENT_EXEC and stops. In this case SIGNR has no effect after PTRACE_CONT/DETACH/etc. SIGNR does not ignored after the tracee reported syscall entry/exit or signal. OK, if only such exceptional cases as PTRACE_EVENT_EXEC that should not matter I think. It should work also for PTRACE_SINGLESTEP. Thanks, Jan
Re: Stopped detach/attach status
On Mon, 05 Oct 2009 04:32:08 +0200, Oleg Nesterov wrote: On 10/01, Jan Kratochvil wrote: the ptrace-testsuite http://sourceware.org/systemtap/wiki/utrace/tests currently FAILs (also) on Fedora 12 kernel-2.6.31.1-48.fc12.x86_64 for: FAIL: detach-stopped FAIL: stopped-attach-transparency [...] As for user-space, I don't really understand the second test-case, this again means I don't understand the supposed behaviour. The high level goal is described at its top. Users expect that if they run `gstack PID' or `gcore PID' the target PID will be absolutely in the same state as before gstack/gcore. That means it will keep both whether it was / was not stopped and also any possible existing / non-existing pending signal for a possible future waitpid() from its real (non-ptrace) parent PID. Another question whether technically what it does is right but this high level goal is hopefully valid. Except, stopped-attach-transparency prints Excessive waiting SIGSTOP after the second attach/detach afaics the test-case is not right here. attach_detach() leaves the traced threads in STOPPED state, why pid_notifying_sigstop() should fail? [ Not replying this part, have not built a kernel with this patch now. ] In this case, I don't understand why stopped-attach-transparency sends SIGSTOP to every sub-thread. If the tracer wants to stop the thread group after detach, it can do ptrace(PTRACE_DETACH, anythread, SIGSTOP); for_each_other_thread(pid) ptrace(PTRACE_DETACH, anythread, 0); or just kill(SIGSTOP); for_each_thread(pid) ptrace(PTRACE_DETACH, anythread, 0); OK, it this is the recommended way I can fix the testcase this way. The all-threads-being-sent-SIGSTOP way IIRC worked on linux-2.6.9 but I do not think this part of the compatibility must be kept. Thanks, Jan
Stopped detach/attach status
Hi Oleg, the ptrace-testsuite http://sourceware.org/systemtap/wiki/utrace/tests currently FAILs (also) on Fedora 12 kernel-2.6.31.1-48.fc12.x86_64 for: FAIL: detach-stopped FAIL: stopped-attach-transparency Do you agree with the testcases and is it planned to fix them for F12? Thanks, Jan
Re: Q: what user_enable_single_step() actually means?
On Wed, 23 Sep 2009 02:36:54 +0200, Roland McGrath wrote: It would be worthwhile to cons a version of this test case that uses PTRACE_SINGLESTEP instead of PTRACE_SYSCALL. I think your situation is tickling the same issue, but we should have an empirical test. [...] I have a fix in hand that I'll send upstream before too long. But perhaps it should wait for the PTRACE_SINGLESTEP version of the test case. Seeing you already added one yourself. 2009-09-23 05:31 roland * tests/: Makefile.am (1.56), step-from-clone.c (1.1): Add step-from-clone test. Regards, Jan
Re: [PATCH 38] make sure PTRACE_CONT disables SYSCALL_EXIT report
On Fri, 18 Sep 2009 00:17:24 +0200, Roland McGrath wrote: For any test case you found useful, please add it to the ptrace-tests suite. Jan can help you get it in the right form and get it committed. Checked-in as: http://sources.redhat.com/cgi-bin/cvsweb.cgi/~checkout~/tests/ptrace-tests/tests/sigint-before-syscall-exit.c?cvsroot=systemtap Or if he would prefer not to keep maintaining that suite, we can get you set up on sourceware so you can do it yourself. With current utrace it is no longer a fulltime assignment so it is OK this way. Thanks, Jan
Re: attach-wait-on-stopped vs detach-stopped
Hi Roland, thanks for your detailed explanation making the complex problem looking easy. On Fri, 08 Aug 2008 07:46:35 +0200, Roland McGrath wrote: In the latest upstream kernels, detach-stopped is the only ptrace-tests case failing. A fix I tried for that worked, but made attach-wait-on-stopped start failing instead. Can you tell me if you think the expectation in attach-wait-on-stopped really seems correct? It seems to be contrary to what detach-stopped wants. In attach-wait-on-stopped, this happens: untraced child stops with normal SIGSTOP parent does not wait, stopped state still to be waited for parent does PTRACE_ATTACH - child still in job stop, now has pending SIGSTOP parent does wait, sees it stopped with SIGSTOP (the first one) parent does PEEKUSR, GETREGS (should make no difference) parent does PTRACE_DETACH * - child has never left job stop, is still in job stop, stays in job stop after detach, does not wake up parent does PTRACE_ATTACH - child still in job stop, but has been waited for still pending SIGSTOP (third one came but second one still waiting) parent does wait, blocks since child is waited-for but still stopped What happened before my fix was that PTRACE_DETACH unconditionally woke the thread up from whatever state it was in. So here, Just to comment your *here* means the *-marked line. it woke up, saw the old pending SIGSTOP, and stopped again (ptrace stop)--now with a fresh still to be waited for stopped status. My explanation: This SIGSTOP you describe was generated by PTRACE_ATTACH. As we are now after PTRACE_DETACH (with no TracerPid) when this SIGSTOP is delivered we get into `T (stopped)' (and not `T (tracing stop') state. But this wakeup on PTRACE_DETACH was exactly what detach-stopped does not want to see. attach-wait-on-stopped uses PTRACE_DETACH,0 while detach-stopped uses PTRACE_DETACH,SIGSTOP. With the `attach-wait-on-stopped uses PTRACE_DETACH,0' testcase part I just tried to pinpoint the utrace-ptrace difference being considered a regression. Upstream GDB did not support attaching-to-stopped processes before and it still has the detach-as-stopped behavior currently undefined. = I am not aware it would cause any real-world problems to FAIL the second-attach case of attach-wait-on-stopped. So both tests can be satisfied if what it means is that PTRACE_DETACH always wakes up a thread (even one that has never left job control stop), but it should stop again for the new SIGSTOP. (The reason it doesn't stop again now is an esoteric internal one.) Is that what you think the rule ought to be? Yes. OTOH I do not find why your way would cause any real-world troubles if you find it more systematic. The if in job stop, stay in job stop rule seems more sensible to me. That would make detach-stopped pass and attach-wait-on-stopped fail. As you're aware, the subtle difference between staying stopped and waking up followed by an immediate stop is the freshening of the wait status and wakeup of a parent/tracer's blocked wait calls. The goal of the GDB attach-detach behavior is to be fully transparent. Running /usr/bin/gcore (GDB attach+gcore+detach commands) should leave the process in a perfectly unchanged state. We have to eat the pending SIGSTOP notification during `attach'. With the `PTRACE_ATTACH, tkill(SIGSTOP), PTRACE_CONT(0), waitpid()' trick (recent upstream or mid-term RH/Fedora) GDB copes even with stopped processes with alread pre-eaten pending SIGSTOP notification. I find it GCORE to be more friendly by possibly generating one excessive SIGSTOP notification than to possibly eat the only one remaining SIGSTOP notification. At least there are applications which run external GCORE on its SIGSTOP-ped sub-processes which may (not confirmed) expect waitpid() to give them SIGSTOP afterwards as it worked on before (to be specific - RHEL-4, 2.6.9 non-utrace). I do not know about a raceless way how to find whether the SIGSTOP notification was already pending before PTRACE_ATTACH (BTW `/proc/PID/status' content does not change on the pending/eaten notification). Therefore a wish for a possibility to PTRACE_DETACH two ways (leaving/not-leaving a pending notification) is out of question. Thanks, Jan
Re: x86_64-cs failed in x86
Hi, On Thu, 24 Jul 2008 08:38:16 +0200, Wenji Huang wrote: [tests]$ ./x86_64-cs ./x86_64-cs: WIFSTOPPED - WSTOPSIG = 4 x86_64-cs: x86_64-cs.c:160: main: Assertion `0' failed. Thanks, committed (and also that unexpected values are a PASS now - a problem would be just a kernel crash). Regards, Jan
Re: x86 single-step issues
On Wed, 09 Jul 2008 23:28:55 +0200, Roland McGrath wrote: I have some more fixes in the x86 bowels about ready to send upstream. From the status quo upstream, my changes get FAIL-PASS for step-jump-cont-strict (32 64), step-through-sigret (32). Even step-jump-cont-strict, great. Does that cover all the issues you know about? Yes, thanks. Jan
Re: Issues when attaching to stopped process
On Mon, 09 Jun 2008 19:23:09 +0200, Matthew Legendre wrote: We're seeing issues when trying to attached to an already stopped process on recent utrace kernels (seen on Fedora Core 8 and 9)--waitpid reports the arrival of numerous signal 0s, Being tracked as `stop-attach-then-wait' at: http://sourceware.org/systemtap/wiki/utrace/tests While you are right the ptrace-on-utrace emulation is currently incompatible to ptrace for proper behavior during later ptrace operations you should resume the job control stop first [attached]. It is a code from a GDB code at: http://sourceware.org/ml/gdb-patches/2008-05/msg00022.html Thanks, Jan --- test_attached_to_stopped.c 2008-06-10 23:36:54.0 +0200 +++ test_attached_to_stopped-jk.c 2008-06-11 19:19:08.0 +0200 @@ -19,6 +19,38 @@ void stop_self() kill(getpid(), SIGSTOP); } +/* gdb/linux-nat.c */ +/* Detect `T (stopped)' in `/proc/PID/status'. + Other states including `T (tracing stop)' are reported as false. */ + +static int +pid_is_stopped (pid_t pid) +{ + FILE *status_file; + char buf[100]; + int retval = 0; + + snprintf (buf, sizeof (buf), /proc/%d/status, (int) pid); + status_file = fopen (buf, r); + if (status_file != NULL) +{ + int have_state = 0; + + while (fgets (buf, sizeof (buf), status_file)) + { + if (strncmp (buf, State:, 6) == 0) + { + have_state = 1; + break; + } + } + if (have_state strstr (buf, T (stopped)) != NULL) + retval = 1; + fclose (status_file); +} + return retval; +} + int attach_then_run(void (*func)(void)) { int pid, result; @@ -32,6 +64,28 @@ int attach_then_run(void (*func)(void)) perror(Ptrace attach error); exit(-1); } + if (pid_is_stopped(pid)) { + /* gdb/linux-nat.c */ + + /* The process is definitely stopped. It is in a job control + stop, unless the kernel predates the TASK_STOPPED / + TASK_TRACED distinction, in which case it might be in a + ptrace stop. Make sure it is in a ptrace stop; from there we + can kill it, signal it, et cetera. + + First make sure there is a pending SIGSTOP. Since we are + already attached, the process can not transition from stopped + to running without a PTRACE_CONT; so we know this signal will + go into the queue. The SIGSTOP generated by PTRACE_ATTACH is + probably already in the queue (unless this kernel is old + enough to use TASK_STOPPED for ptrace stops); but since SIGSTOP + is not an RT signal, it can only be queued once. */ + kill (pid, SIGSTOP);/* tgkill() is required for threads! */ + + /* Finally, resume the stopped process. This will deliver the SIGSTOP + (or a higher priority signal, just like normal PTRACE_ATTACH). */ + ptrace (PTRACE_CONT, pid, 0, 0); + } return pid; }
Re: Is PTRACE_SINGLEBLOCK buggy?
On Mon, 02 Jun 2008 11:09:56 +0200, Renzo Davoli wrote: Jan Kratochvil has just sent me an E-mail saying that it seems to be a kvm bug (or a bug caused by kvm). KVM bug details at https://bugzilla.redhat.com/show_bug.cgi?id=437028 . He is right: using qemu/kqemu instead of kvm it does not panic. Anyway I am puzzled. Using kvm the PTRACE_SINGLEBLOCK should have the same effect on 2.6.25.4 and 2.6.25.4+utrace. 2.6.25.4: ptrace_resume(kernel/ptrace.c)-user_enable_block_step 2.6.25.4+utrace: ptrace_common(kernel/ptrace.c) sets UTRACE_ACTION_BLOCKSTEP -utrace_quiescent(kernel/utrace.c) tests UTRACE_ACTION_BLOCKSTEP -user_enable_block_step I wonder where is the difference... Just FYI on 2.6.25 I still get the crash, host: kernel: kvm: 19661: cpu0 unhandled wrmsr: 0x1d9 data 2 kernel-2.6.25.3-18.fc9.x86_64 kvm-65-7.fc9.x86_64 guest: vanilla 2.6.25 x86_64 Pid: 1945, comm: block-step Not tainted 2.6.25-0.101.rc4.git3.fc8 #1 RIP: 0010:[8100ab79] [8100ab79] __switch_to+0x218/0x2bc (the version number is for a RPM-built vanilla kernel) (I did not find any ptrace patches in between 2.6.25 and 2.6.25.4.) Regards, Jan
ptrace testsuite: reparent-zombie* race
Hi Roland, I get randomly a race reparent-zombie: reparent-zombie.c:88: create_zombie: Assertion `fd != -1' failed. Aborted on kernel-2.6.25.3-18.fc9.x86_64. I hope the attached patch is right (tested only for reparent-zombie.c as reparent-zombie-clone.c is crashing the kernel). Best Regards, Jan --- tests/reparent-zombie.c 2 May 2008 01:27:20 - 1.1 +++ tests/reparent-zombie.c 2 Jun 2008 12:40:01 - @@ -78,15 +78,19 @@ create_zombie (void) assert (WIFSTOPPED (status)); assert (WSTOPSIG (status) == SIGUSR1); + /* We must open the status file first as if CHILD would finish in between + TRACE_CONT and this OPEN we would fail with ENOSRCH as no zombie is left + as we have set the SIGCHLD handler to SIG_IGN (kernel reaps the died + children without creating any zombies. */ + snprintf (buf, sizeof buf, /proc/%d/status, (int) child); + fd = open (buf, O_RDONLY); + assert (fd != -1); + errno = 0; l = ptrace (PTRACE_CONT, child, 0l, 0l); assert_perror (errno); assert (l == 0); - snprintf (buf, sizeof buf, /proc/%d/status, (int) child); - fd = open (buf, O_RDONLY); - assert (fd != -1); - do { sched_yield (); @@ -173,6 +177,8 @@ main (void) signal (SIGABRT, handler_fail); signal (SIGALRM, handler_fail); + /* SIG_IGN as we want no zombies left - kernel reaps the died children + without creating any zombies. */ signal (SIGCHLD, SIG_IGN); fd = create_zombie (); --- tests/reparent-zombie-clone.c 2 May 2008 01:27:20 - 1.1 +++ tests/reparent-zombie-clone.c 2 Jun 2008 12:44:15 - @@ -123,6 +123,14 @@ create_zombie (void) assert (WIFSTOPPED (status)); assert (WSTOPSIG (status) == SIGSTOP); + /* We must open the status file first as if MSG would finish in between + TRACE_CONT and this OPEN we would fail with ENOSRCH as no zombie is left + as we have set the SIGCHLD handler to SIG_IGN (kernel reaps the died + children without creating any zombies. */ + snprintf (buf, sizeof buf, /proc/%d/status, (int) msg); + fd = open (buf, O_RDONLY); + assert (fd != -1); + errno = 0; l = ptrace (PTRACE_CONT, msg, 0l, 0l); assert_perror (errno); @@ -135,10 +143,6 @@ create_zombie (void) child = msg; - snprintf (buf, sizeof buf, /proc/%d/status, (int) child); - fd = open (buf, O_RDONLY); - assert (fd != -1); - do { sched_yield (); @@ -225,6 +229,8 @@ main (void) signal (SIGABRT, handler_fail); signal (SIGALRM, handler_fail); + /* SIG_IGN as we want no zombies left - kernel reaps the died children + without creating any zombies. */ signal (SIGCHLD, SIG_IGN); fd = create_zombie ();
Re: Tests about bug step-jump-cont
Hi Wenji, while I cannot comment on your kernel code analysis the testcase was definitely broken since 2008-02-03 - it never PASSed. It should be fixed now. /* We must set PC to our new function as the current PC stays in the glibc function RAISE no matter which part of the code called it - we would have to save and restore the whole stack for a proper restart of the code. */ I was not sure of its correctness, sorry for the delay. Regards, Jan On Thu, 13 Mar 2008 10:25:04 +0100, Wenji Huang wrote: Hi, I made tests of step-jump-cont (utrace wiki page) on i686 and x86_64 with upstream 2.6.24 kernel. They have different behaviors. With help of assert statement and stap script, I got the following understandings: For i686: 1. Wait child stop upon SIGUSR1 2. Set singlestep on child : child-ptrace |= PT_DTRACE regs-eflags |= TRAP_FLAG 3. Change child regs-eflags |= TRAP_FLAG 4. Continue the child and clear child-ptrace and regs-eflags due to passed checking child-ptrace 5. Wait child stop, got signal SIGUSR2 6. Change the child regs-eflags |= TRAP_FLAG 7. Continue the child, but couldn't clear regs-eflags due to failed checking child-ptrace 8. Wait child, but got signal SIGTRAP due to eflags (Child stop on sending SIGUSR2) For x86_64: 1. Wait child stop upon SIGUSR1 2. Set singlestep on child : child-ptrace |= PT_DTRACE regs-eflags |= TRAP_FLAG. (*** But these are missing after the syscall ***) 3. Change child regs-eflags |= TRAP_FLAG 4. Continue the child, but couldn't clear regs-eflags due to failed checking child-ptrace 5. Wait child, but got signal SIGTRAP due to eflags (Child stop on sending SIGUSR1). So I think it may be correct in i686 case, just need to change testcase. But it looks like there are some problems in x86_64 code. Regards, Wenji