Re: gdbstub initial code
On Mon, Jul 12, 2010 at 08:37:29PM +0200, Oleg Nesterov wrote: Hello. Please see the attachment. Don't take this code seriously, this is the early prototype and everything should be rewritten. It barely uses utrace, only to stop the target. (gdb) file /path/to/binary (gdb) target extended-remote /proc/ugdb (gdb) attach PID (gdb) disassemble _start (gdb) bt (gdb) info registers (gdb) info threads (gdb) detach This seems to work, but I had to export access_process_vm(). Currently it only attaches to the single thread. vCont or ^C doesn't work. I still can't understand what utrace_xxx_pid() buys us, and I still think that utrace_prepare_examine() can't protect the task even for regset calls. IMHO, if this is the start of another stab at getting utrace in the upstream kernel, you may want to consider Linus' problem statement for utrace -- infrastructure that will allow strace and gdb on the same thread at the same time. OTOH, the Tom Tromey alluded on lkml that if kernel provides infrastructure that allows for breakpoint assistance and 'displaced' stepping, with a suitable user interface, preferably a ptrace variant that is fd based, they'll migrate gdb to using the interface. (The bp assistance and displaced stepping can be provided with extensions to the current uprobes set under review). In any case, its good to see a restart of this effort. Though there has been support for gdbstub in the past, an overwhelming majority of people would like to see a 'user interface', be it ptrace2 or PTRACE_ATTACH2 or whatever it needs to be called. Given these requirements, and given that the 'new' uprobes is close to -tip, would it be more useful to pursue an alternate syscall approach rather than gdbstub? Ananth
Re: linux-next: add utrace tree
On Fri, Jan 29, 2010 at 10:11:16AM +0100, Ingo Molnar wrote: * Ananth N Mavinakayanahalli ana...@in.ibm.com wrote: On Fri, Jan 29, 2010 at 08:39:07AM +0100, Ingo Molnar wrote: ... When we merged kprobes ~10 years ago we made the (rather bad) mistake of merging a raw, opaque facility and leaving 'the rest' up to some other entity. IBM kprobes hackers vanished the day the original kprobes code went upstream and the high level entity never truly materialized in-kernel, for nearly a decade! I don't know what you are referring to here... Kprobes was merged in 2.6.9 (~August 2004 -- less than 6 years ago). [...] Ok, 6 years then :-) [...] Since then, we did work on ports to powerpc and s390. We implemented kretprobes. We made it much scalable using RCU; we did the powerpc booster to skip single-step when possible, not to mention various bug fixes over the years. Except it had no real in-kernel user. Not that I want to rebut you Ingo, but there were in-kernel users since 2006 (net/ipv4/tcp_probe.c) :-) Aside, I am also glad that we have more flexibility with the perf integration. Ananth
Re: linux-next: add utrace tree
On Thu, Jan 28, 2010 at 09:55:02AM +0100, Ingo Molnar wrote: ... Lets compare the two cases via a drawing. Your current uprobes submission does: [kernel] do probe thing single-step trap ^| ^ | |v | v [user] INT3XOL-ins next ins-stream ( add the need for serialization to make sure the whole single-step thing does not get out of sync with reality. ) And emulator approach would do: [kernel] emul-demux-fastpath, do probe thing ^ | | v [user] INT3 next ins-stream far simpler conceptually, and faster as well, because it's one kernel entry. Ingo, Yes, conceptually, emulation is simpler. In fact, it may even be the right thing to do from a housekeeping POV if gdb were enabled to use breakpoint assistance in the kernel. However... emulation is not easy. Just quoting Peter Anvin: On the more general rule of interpretation: I'm really concerned about having a bunch of partially-capable x86 interpreters all over the kernel. x86 is *hard* to emulate, and it will only get harder as the architecture evolves. -hpa Yes, I know you suggested we start with a small subset. We already have an implementation of instruction emulation in kernel for x86 and powerpc, but its too KVM centric. If there is a generic emulation layer, we would use it. There are conflicting opinions for either case; complicated as it is, the XOL scheme works and, to a large extent, it is easily extendable to other architectures compared to the emulation approach. Uprobes can be made to use emulation when possible/available, but I don't think this should be gating decision for the initial implementation of the feature. Ananth
Re: linux-next: add utrace tree
On Wed, Jan 27, 2010 at 11:55:16AM +0100, Peter Zijlstra wrote: On Wed, 2010-01-27 at 02:43 -0800, Linus Torvalds wrote: On Wed, 27 Jan 2010, Peter Zijlstra wrote: Right, so you're going to love uprobes, which does exactly that. The current proposal is overwriting the target instruction with an INT3 and injecting an extra vma into the target process's address space containing the original instruction(s) and possible jumps back to the old code stream. Just out of interest, how does it handle the threading issue? Last I saw, at least some CPU people were _very_ nervous about overwriting instructions if another CPU might be just about to execute them. Even the overwrite only the first byte with 'int3' made them go umm, I need to talk to some core CPU people to see if that's ok. They mumble about possible CPU errata, I$ coherency, instruction retry etc. I realize kprobes does this very thing, but kprobes is esoteric stuff and doesn't have much choice. In user space, you _could_ do the modification on a different physical page and then just switch the page table entry instead, and not get into the whole D$/I$ coherency thing at all. Right, so there's two aspects: 1) concurrency when inserting the probe 2) concurrency when hitting the probe 1) used to be dealt with by using utrace to stop all threads in the process and then writing the instruction. I suggested to CoW the page, modify the instruction, set the pagetable and flush tlbs at full speed -- the very thing you suggest here. 2) so traditionally (and the intel arch manual describes this) is to replace the instruction, single step it, and write the probe back. This is racy for multi-threading. The current uprobes stuff solves this by doing single-step-out-of-line (XOL). XOL injects a new vma into the target process and puts the old instruction there, then it single steps on the new location, leaving the original site with INT3. This doesn't work for things like RIP relative instructions, so uprobes considers them un-probable. Probing RIP-relative instructions work just fine; there are fixups that take care of it. Also, I myself really object to inserting a vma in a running process, its like a land-lord, sure he has the key but he won't come in an poke through your things. The alternative is to place the instruction in TLS or stack space, since each thread can only have a single trap at a time, you only need space for 1 instruction (plus a possible jump out to the original site). There is the 'problem' of marking the TLS/stack executable when being probed. Then there is the whole emulation angle, the uprobes people basically say its too much effort to write a x86 emulator. We don't need to write one. I don't know how easy it is to make the kvm emulator less kvm-centric (vcpus, kvm_context, etc). Avi? Ananth
Re: linux-next: add utrace tree
On Wed, Jan 27, 2010 at 12:08:31PM +0100, Peter Zijlstra wrote: On Wed, 2010-01-27 at 16:35 +0530, Ananth N Mavinakayanahalli wrote: Probing RIP-relative instructions work just fine; there are fixups that take care of it. Ah my bad then, it was my understanding you simply bailed on those. Just for my information, how large are the replacement sequences? The RIP relative instruction is transformed into indirect addressing mode using a scratch register. For details http://marc.info/?l=linux-kernelm=126401936114639w=2. Ananth
Re: linux-next: add utrace tree
On Mon, Jan 25, 2010 at 01:41:57PM -0800, Linus Torvalds wrote: On Mon, 25 Jan 2010, Tom Tromey wrote: ... * Support displaced stepping in the kernel; I think this would improve performance when debugging in non-stop mode. Don't we already do that at least on x86? Just doing a single-step should work on an instruction even if it has a breakpoint on it, because we set the TF bit. Or maybe I'm not understanding what displaced stepping means to you. If Tom is referring to supporting single-stepping out of line, ie., not putting back the original instruction at the bp location, yes, we already support it on various architectures for kernel breakpoints, through the kprobes infrastructure. For userspace, there are more complications to take care of. We are reworking a prototype based on community comments (see the long UBP/XOL thread on lkml from a few days ago). Hopefully the userspace breakpoint assistance layer will be generic enough for gdb to also take advantage of, though the interface details need to be hashed out. Ananth
Re: linux-next: add utrace tree
On Sat, Jan 23, 2010 at 12:23:33PM +0100, Ingo Molnar wrote: * Kyle Moffett k...@moffetthome.net wrote: On Fri, Jan 22, 2010 at 19:22, Linus Torvalds torva...@linux-foundation.org wrote: ... In that sense it might be better to fix/enhance ptrace, if there's interest. I've written a handful of ptrace extensions in the past (none of them went upstream tho), it can be done in a useful manner and the code is pretty hackable. There are basic problems left to be solved: for example why is there still no 'memory block copy' call, why are we _still_ limited to one word per system call PTRACE_PEEK* memory copies? It's ridiculous. SparcLinux has PTRACE_WRITE*/READ* support that implements this, but none of the other architectures have it so it's essentially unused. Or another possible direction would be to extend the perf events syscall with interception capabilities. It's far more performant at extracting application state without scheduling than any ptrace method - and interception/injection would be a natural next step - if there's interest. This certainly is now a chicken and egg problem. Everybody agrees that Linux needs something better than ptrace; legacy ptrace will continue to live, so will utilities written to it (strace, etc). But should that limit what Linux can offer? What's the way out? - Enhance ptrace: At least one ptrace maintainer (Roland) had publically stated he doesn't prefer enhancing legacy ptrace -- that its already a beast to maintain, and adding more complexity to it does it no good. - Extend perf; would perf then use utrace underneath? Or would one have to redo some of what utrace already does for thread level control? - Give utrace a syscall and make it the primary way for users to interact with the layer. There are benefits to this if there is agreement on the utrace layer itself, maybe with less fexibility than what it currently offers? If yes, what should it look like? Any new debug facility will have to incorporate some or most learnings from what utrace tried to address. It would be sad to just dump utrace and redo everything from scratch or band-aid existing interfaces. Ananth
Re: linux-next: add utrace tree
On Thu, Jan 21, 2010 at 05:28:42PM -0800, Linus Torvalds wrote: On Thu, 21 Jan 2010, Andrew Morton wrote: ptrace is a nasty, complex part of the kernel which has a long history of problems, but it's all been pretty quiet in there for the the past few years. More importantly, we're not ever going to get rid of it. FWIW, Oleg's implementation of ptrace over utrace is 100% compatible with legacy ptrace; gdb testsuite indicates that (http://lkml.org/lkml/2009/12/21/98). Ananth
Re: [RFC] [PATCH 0/7] UBP, XOL and Uprobes [ Summary of Comments and actions to be taken ]
On Fri, Jan 22, 2010 at 12:32:32PM +0530, Srikar Dronamraju wrote: Here is a summary of the Comments and actions that need to be taken for the current uprobes patchset. Please let me know if I missed or misunderstood any of your comments. 1. Uprobes depends on trap signal. Uprobes depends on trap signal rather than hooking to the global die notifier. It was suggested that we hook to the global die notifier. In the next version of patches, Uprobes will use the global die notifier and look at the per-task count of the probes in use to see if it has to be consumed. However this would reduce the ability of uprobe handlers to sleep. Since we are dealing with userspace, sleeping in handlers would have been a good feature. We are looking at ways to get around this limitation. We could set a TIF_ flag in the notifier to indicate a breakpoint hit and process it in task context before the task heads into userspace. Ananth
Re: linux-next: add utrace tree
On Wed, Jan 20, 2010 at 06:49:50AM +0100, Ingo Molnar wrote: Ingo, Note, i'm not yet convinced that this (and the rest: uprobes and systemtap, etc.) can go uptream in its present form. Agreed, uprobes is still not upstream ready -- it was an RFC. We are working through the comments there to get it ready for merger. IMHO the far more important thing to address beyond formalities and workflow cleanliness are the (many) technical observations and objections offered by Peter Zijstra on lkml. Not just the git history but also the abstractions and concepts are messy and should be reworked IMO, and also good and working perf events integration should be achieved, etc. I think Oleg addressed most of Peter's concerns on utrace when the ptrace/utrace patchset was reposted. Perf integration with uprobes will be done and discussions have started with Masami and Frederic. There are a couple of fundamental technical aspects (XOL vma vs. emulation; breakpoint insertion through CoW and not through quiesce) that need resolution. The fact that there's a well established upstream workflow for instrumentation patches, which is being routed around by the utrace/uprobes/systemtap code here is not a good sign in terms of reaching a good upstream solution. Lets hope it works out well though. Agreed. On the other hand, having ptrace/utrace in the -next tree will give it a lot more testing, while any outstanding technical issues are being addressed. Stephen, To exercise ptrace/utrace, it would be very useful if you pulled in git://git.kernel.org/pub/scm/linux/kernel/git/frob/linux-2.6-utrace.git branch utrace-ptrace instead of 'master'. Thanks, Ananth
Re: [RFC] [PATCH 1/7] User Space Breakpoint Assistance Layer (UBP)
On Mon, Jan 18, 2010 at 02:13:25PM +0200, Pekka Enberg wrote: Hi Avi, On Mon, 2010-01-18 at 14:01 +0200, Avi Kivity wrote: Maybe you place no value on uprobes. But people who debug userspace likely will see a reason. On 01/18/2010 02:06 PM, Peter Zijlstra wrote: I do see value in uprobes, I just don't like it mucking about with the address space. Nor does it appear required. On Mon, Jan 18, 2010 at 2:09 PM, Avi Kivity a...@redhat.com wrote: Well, the alternatives are very unappealing. Emulation and single-stepping are going to be very slow compared to a couple of jumps. So how big chunks of the address space are we talking here for uprobes? As Srikar mentioned, the least we start with is 1 page. Though you can have as many probes as you want, there are certain optimizations we can do, depending on the most common usecases. For eg., if you'd consider the start of a routine to be the most commonly traced location, most routines in a binary would generally start with the same instruction (say push %ebp), and we can refcount a slot with that instruction to be used for all probes of the same instruction. Ananth
Re: [RFC] [PATCH 1/7] User Space Breakpoint Assistance Layer (UBP)
On Mon, Jan 18, 2010 at 06:52:32PM +0200, Avi Kivity wrote: On 01/18/2010 05:43 PM, Ananth N Mavinakayanahalli wrote: Well, the alternatives are very unappealing. Emulation and single-stepping are going to be very slow compared to a couple of jumps. So how big chunks of the address space are we talking here for uprobes? As Srikar mentioned, the least we start with is 1 page. Though you can have as many probes as you want, there are certain optimizations we can do, depending on the most common usecases. For eg., if you'd consider the start of a routine to be the most commonly traced location, most routines in a binary would generally start with the same instruction (say push %ebp), and we can refcount a slot with that instruction to be used for all probes of the same instruction. But then you can't follow the instruction with a jump back to the code... Right. This will work only for the non boosted case where single-stepping is mandatory. I guess the tradeoff is vma space and speed. Ananth
Re: [RFC] [PATCH 1/7] User Space Breakpoint Assistance Layer (UBP)
On Fri, Jan 15, 2010 at 10:03:48AM +0100, Peter Zijlstra wrote: On Thu, 2010-01-14 at 11:46 -0800, Jim Keniston wrote: discussed elsewhere. Thanks for the pointer... :-) Peter, I think Jim was referring to http://sources.redhat.com/ml/systemtap/2007-q1/msg00571.html Ananth
Re: [RFC] [PATCH 1/7] User Space Breakpoint Assistance Layer (UBP)
On Fri, Jan 15, 2010 at 11:13:32AM +0100, Peter Zijlstra wrote: On Fri, 2010-01-15 at 15:40 +0530, Ananth N Mavinakayanahalli wrote: Ideas? emulate the one instruction? In kernel? Generically? Don't think its that easy for userspace -- you have the full gamut of instructions to emulate (fp, vector, etc); further, the instruction could itself cause a page fault and the like.
Re: [RFC] [PATCH 4/7] Uprobes Implementation
On Tue, Jan 12, 2010 at 06:36:00AM +0100, Frederic Weisbecker wrote: On Mon, Jan 11, 2010 at 05:55:53PM +0530, Srikar Dronamraju wrote: +static const struct utrace_engine_ops uprobe_utrace_ops = { + .report_quiesce = uprobe_report_quiesce, + .report_signal = uprobe_report_signal, + .report_exit = uprobe_report_exit, + .report_clone = uprobe_report_clone, + .report_exec = uprobe_report_exec +}; So, as stated before, uprobe seems to handle too much standalone policies such as freeing on exec, always inherit on clone and never on fork. Such rules should be decided from uprobe clients not from uprobe itself and that makes it not enough flexible to be usable for now. The freeing on exec is only housekeeping of uprobe data structures. And probepoints are inherited only on CLONE_THREAD and not otherwise, simply since the existing probes can be hit in the new thread's context. Not sure what other policy you are hinting at. For example if we want it to be usable by perf, we have two ways: - a trace event. Unfortunately, like I explained in a previous mail, this doesn't seem to be a suitable interface for this particular case. - a performance monitoring unit, with the existing unified interface struct pmu, usable by perf. Typically, to use it with perf toward a pmu, perf tools need to create a uprobe on perf process and activate its hook on the next exec. Thereafter, it's up to perf to decide if we inherit through clone and fork. As mentioned above, the inheritance is only for threads. It should be fairly easy to inherit probes on fork, and that can be made a perf based policy decision. Here I fear utrace and perf are going to collide. Utrace does not mandate any of the above concerns you've mentioned. Utrace just provides callbacks at the said events and uprobes can be tweaked to accommodate perf's requirements as possible, as feasible. See how could be the final struct pmu (we need to extend it to support utrace): struct pmu { enable() - called we schedule in a context where we want a uprobe to be active. Called very often disable() - the above opposite /* Not yet existing callbacks */ hook_task() - called when a process is created which we want to activate our hook would be typically called once on exec if we have set enable_on_exec and also on clone()/fork() if we want to inherit. } The above hook_task (could be divided in more precise callback events like hook_on_exec, hook_on_clone, etc...) would be needed by perf to drive correctly utrace and this is going to collide with utrace callbacks that notify execs and forks. Probably utrace can be kept for all the utrace breakpoint signal handling an so. But I guess the rest can be implemented on top of a struct pmu and driven by perf like we did with hardware breakpoints re-implementation. Just an idea. Well, I wonder if perf can ride on utrace's callbacks for the hook_task() for the clone/fork cases? Ananth
Re: [PATCH 6/7] implement utrace-ptrace
On Mon, Dec 21, 2009 at 07:18:37PM +0530, Ananth N Mavinakayanahalli wrote: On Fri, Dec 18, 2009 at 02:11:40AM +0100, Oleg Nesterov wrote: The patch adds the new file, kernel/ptrace-utrace.c, which contains the new implementation of ptrace over utrace. This file is not compiled until we have CONFIG_UTRACE option, will be added by the next utrace core patch. It's supposed to be an invisible implementation change, nothing should change to userland when CONFIG_UTRACE is enabled. Signed-off-by: Roland McGrath rol...@redhat.com Signed-off-by: Oleg Nesterov o...@redhat.com --- Oleg, ptrace/utrace performs better with the ptrace-tests [1] testsuite (1 failure vs. 3 with vanilla ptrace [1]). The gdb testsuite also has no regressions. In fact, the results on the gdb testsuite are identical. Forgot to mention the tests were on powerpc.
Re: [PATCH 6/7] implement utrace-ptrace
On Fri, Dec 18, 2009 at 02:11:40AM +0100, Oleg Nesterov wrote: The patch adds the new file, kernel/ptrace-utrace.c, which contains the new implementation of ptrace over utrace. This file is not compiled until we have CONFIG_UTRACE option, will be added by the next utrace core patch. It's supposed to be an invisible implementation change, nothing should change to userland when CONFIG_UTRACE is enabled. Signed-off-by: Roland McGrath rol...@redhat.com Signed-off-by: Oleg Nesterov o...@redhat.com --- Oleg, ptrace/utrace performs better with the ptrace-tests [1] testsuite (1 failure vs. 3 with vanilla ptrace [1]). The gdb testsuite also has no regressions. In fact, the results on the gdb testsuite are identical. Ananth [1] http://sourceware.org/systemtap/wiki/utrace/tests [2] detach-stopped and stopped-attach-transparency are the additional failures. syscall-reset fails in both cases.
Re: powerpc: step-jump-cont failure (Was: [PATCH] utrace: don't set -ops = utrace_detached_ops lockless)
On Mon, Dec 07, 2009 at 01:43:27PM +0100, Oleg Nesterov wrote: On 12/06, CAI Qian wrote: Ananth, could you please confirm once again that step-jump-cont (from ptrace-tests testsuite) not fail on your machine? If yes, please tell me the version of glibc/gcc. Is PTRACE_GETREGS defined on your machine? Hi Oleg, It works for me on a Fedora 12 machine. [ana...@mjs22lp1 ptrace-tests]$ gcc --version gcc (GCC) 4.4.2 20091027 (Red Hat 4.4.2-7) [ana...@mjs22lp1 ptrace-tests]$ rpm -qa |grep glibc glibc-common-2.11-2.ppc glibc-2.11-2.ppc64 glibc-devel-2.11-2.ppc glibc-static-2.11-2.ppc glibc-2.11-2.ppc glibc-devel-2.11-2.ppc64 glibc-headers-2.11-2.ppc And yes, PTRACE_GETREGS is defined in /usr/include/asm/ptrace.h Ananth
Re: powerpc: step-jump-cont failure (Was: [PATCH] utrace: don't set -ops = utrace_detached_ops lockless)
On Mon, Dec 07, 2009 at 07:05:40PM +0100, Oleg Nesterov wrote: On 12/07, Oleg Nesterov wrote: On 12/07, Jan Kratochvil wrote: On Mon, 07 Dec 2009 15:24:51 +0100, Oleg Nesterov wrote: But. raise_sigusr2 is not equal to the actual address of raise_sigusr2(), this value points to the thunk (I do not know the correct English term) ppc64 calls it function descriptor (GDB ppc64_linux_convert_from_func_ptr_addr): For PPC64, a function descriptor is a TOC entry, Thanks Jan. in a data section, Yes! Now I can't understand how this test-case could ever work on ppc. step-jump-cont does: regs-nip = raise_sigusr2; --- points to data section ptrace(PTRACE_CONT); of course, the tracee gets SIGSEGV, this section is not executable. Hmm. Looks like, powerpc means a lot of different hardware, and _PAGE_EXEC may be 0. I didn't notice this when I quickly grepped arch/powerpc/ IOW, perhaps on some machines r implies x ? Is yes, this can explain why the results differ on different machines. Well, powerpc 32-bit adheres to the SVR4 ABI, while powerpc 64-bit uses the PPC64-ELF ABI (http://refspecs.linuxfoundation.org/ELF/ppc64/). The 64bit ABI uses function descriptors and the 'func_name' is the data address, while the '.func_name' is the text address. (See handle_rt_signal64 in arch/powerpc/kernel/signal_64.c and kprobe_lookup_name in arch/powerpc/include/asm/kprobes.h. Ananth
[QUERY] signal_struct-count/live
Oleg, I am confused as to why we need two atomics count and live in signal_struct. report_death() uses -live as the group_dead indicator, while there are places (like the scheduler) which uses -count as the nr_threads indicator. I tried git blame to see if it remembers why, but the addition predates 2.6.12 and so it does not know. Could you please shed some light on this? Ananth
Re: [QUERY] signal_struct-count/live
On Fri, Nov 27, 2009 at 04:15:21PM +0100, Oleg Nesterov wrote: On 11/27, Ananth N Mavinakayanahalli wrote: I am confused as to why we need two atomics count and live in signal_struct. report_death() uses -live as the group_dead indicator, report_death? Perhaps you meant do_exit() ? Right, do_exit() and that is what is picked up by tracehook_report_death(), and in turn by report_death(). while there are places (like the scheduler) which uses -count as the nr_threads indicator. I tried git blame to see if it remembers why, but the addition predates 2.6.12 and so it does not know. Could you please shed some light on this? In short: signal-count must die. I was going to do this a long ago but never had the time. See also 4ab6c08336535f8c8e42cf45d7adeda882eff06e commit, this is the first step. Last time I did the grepping almost any usage of signal-count is not right. For example, __exit_signal() is correct, but it doesn't need to use -count. Except: it is needed for things like get_nr_threads() in proc. In short: never use signal-count ;) Thanks for the clarification Oleg. Ananth
Re: powerpc: fork stepping (Was: [RFC, PATCH 0/14] utrace/ptrace)
On Fri, Nov 27, 2009 at 06:46:27PM +0100, Veaceslav Falico wrote: On Thu, Nov 26, 2009 at 11:37:03PM +0100, Oleg Nesterov wrote: Could you look at this ptrace-copy_process-should-disable-stepping.patch http://marc.info/?l=linux-mm-commitsm=125789789322573 patch? It is not clear to me how we can modify the test-case to verify it fixes the original problem for powerpc. I modified the test-case, it confirms that ptrace-copy_process-should-disable-stepping.patch fixes the problem with TIF_SINGLESTEP copied by fork() on powerpc. Probably we need a similar fix for step-fork.c in ptrace-tests. Modified the original testcase to call fork via syscall(__NR_fork), to avoid the looping inside libc's fork() on powerpc. The parent singlesteps until he sees that the child has forked, after that the parent PTRACE_CONTs until the child exits. Thanks Veaceslav. This works: Index: ptrace-tests/tests/step-fork.c === --- ptrace-tests.orig/tests/step-fork.c +++ ptrace-tests/tests/step-fork.c @@ -29,6 +29,7 @@ #include unistd.h #include sys/wait.h #include string.h +#include sys/syscall.h #include signal.h #ifndef PTRACE_SINGLESTEP @@ -78,7 +79,7 @@ main (int argc, char **argv) sigprocmask (SIG_BLOCK, mask, NULL); ptrace (PTRACE_TRACEME); raise (SIGUSR1); - if (fork () == 0) + if (syscall(__NR_fork) == 0) { read (-1, NULL, 0); _exit (22); Oleg, With the above patch applied, syscall-reset is the only failure I see on powerpc: errno 14 (Bad address) syscall-reset: syscall-reset.c:95: main: Assertion `(*__errno_location ()) == 38' failed. unexpected child status 67f FAIL: syscall-reset ... 1 of 40 tests failed (11 tests were not run) Please report to utrace-devel@redhat.com Ananth
Re: powerpc: fork stepping (Was: [RFC, PATCH 0/14] utrace/ptrace)
On Thu, Nov 26, 2009 at 03:50:51PM +0100, Oleg Nesterov wrote: I changed the subject. This bug has nothing to do with utrace, the kernel fails with or without these changes. On 11/26, Ananth N Mavinakayanahalli wrote: On Wed, Nov 25, 2009 at 04:40:52PM +0100, Oleg Nesterov wrote: On 11/25, Ananth N Mavinakayanahalli wrote: step-fork: step-fork.c:56: handler_fail: Assertion `0' failed. /bin/sh: line 5: 24803 Aborted ${dir}$tst FAIL: step-fork This is expected. Should be fixed by ptrace-copy_process-should-disable-stepping.patch in -mm tree. (I am attaching this patch below just in case) I din't mention this patch in this series because this bug is ortogonal to utrace/ptrace. The patch doesn't seem to fix the issue on powerpc: step-fork: step-fork.c:56: handler_fail: Assertion `0' failed. /bin/sh: line 5: 17325 Aborted ${dir}$tst FAIL: step-fork Good to know, thanks again Ananth. I'll take a look. Since I know nothing about powerpc, I can't promise the quick fix ;) The bug was found by code inspection, but the fix is not trivial because it depends on arch/, and it turns out the arch-independent fix in ptrace-copy_process-should-disable-stepping.patch http://marc.info/?l=linux-mm-commitsm=125789789322573 doesn't work. Ananth, could you please run the test-case from the changelog below ? I do not really expect this can help, but just in case. Right, it doesn't help :-( GDB shows that the parent is forever struck at wait(). Ananth
Re: [RFC,PATCH 0/14] utrace/ptrace
On Tue, Nov 24, 2009 at 09:01:27PM +0100, Oleg Nesterov wrote: Hello. This is the new iteration of Roland's utrace patch, this time with rewrite-ptrace-via-utrace + cleanups in utrace core. 1-7 are already in -mm tree, I am sending them to simplify the review. 8-12 don not change the behaviour, simple preparations. 13-14 add utrace-ptrace and utrace Oleg, I ran the ptrace-tests testsuite [1] on powerpc on the vanilla ptrace and then with ptrace/utrace. The results for ptrace/utrace look better :-) All tests are 'make check xcheck'. Ananth [1] cvs -d :pserver:anoncvs:anon...@sources.redhat.com:/cvs/systemtap co ptrace-tests - Vanilla ptrace: PASS: ptrace-on-job-control-stopped PASS: attach-wait-on-stopped PASS: detach-can-signal PASS: attach-into-signal PASS: attach-sigcont-wait PASS: sa-resethand-on-cont-signal PASS: ptrace_cont-defeats-sigblock PASS: ptrace-cont-sigstop-detach PASS: ptrace_event_clone PASS: tif-syscall-trace-after-detach PASS: event-exit-proc-maps PASS: event-exit-proc-environ SKIP: x86_64-ia32-gs SKIP: x86_64-gsbase PASS: powerpc-altivec PASS: peekpokeusr PASS: watchpoint PASS: block-step PASS: step-jump-cont SKIP: step-jump-cont-strict PASS: ppc-dabr-race PASS: signal-loss PASS: step-into-handler SKIP: user-area-access PASS: user-regs-peekpoke PASS: erestartsys SKIP: erestart-debugger SKIP: step-to-breakpoint errno 14 (Bad address) syscall-reset: syscall-reset.c:95: main: Assertion `(*__errno_location ()) == 38' failed. unexpected child status 67f FAIL: syscall-reset PASS: reparent-zombie PASS: step-simple SKIP: step-through-sigret PASS: stop-attach-then-wait FAIL: detach-stopped PASS: detach-stopped-rhel5 PASS: clone-multi-ptrace PASS: clone-ptrace PASS: o_tracevfork PASS: o_tracevforkdone PASS: detach-parting-signal PASS: detach-sigkill-race PASS: waitpid-double-report PASS: o_tracevfork-parent PASS: stopped-detach-sleeping FAIL: stopped-attach-transparency SKIP: erestartsys-trap SKIP: highmem-debugger PASS: sigint-before-syscall-exit SKIP: syscall-from-clone step-from-clone: step-from-clone.c:195: main: Assertion `(status 8) == 5' failed. step-from-clone: step-from-clone.c:119: handler_fail: Assertion `0' failed. /bin/sh: line 5: 19825 Aborted ${dir}$tst FAIL: step-from-clone step-fork: step-fork.c:56: handler_fail: Assertion `0' failed. /bin/sh: line 5: 19832 Aborted ${dir}$tst FAIL: step-fork 5 of 41 tests failed (10 tests were not run) Please report to utrace-devel@redhat.com make[3]: *** [check-TESTS] Error 1 make[3]: Leaving directory `/home/ananth/ptrace-tests/tests' make[2]: *** [check-am] Error 2 make[2]: Leaving directory `/home/ananth/ptrace-tests/tests' make[1]: *** [check] Error 2 make[1]: Leaving directory `/home/ananth/ptrace-tests/tests' make: *** [check-recursive] Error 1 - ptrace over utrace: PASS: ptrace-on-job-control-stopped PASS: attach-wait-on-stopped PASS: detach-can-signal PASS: attach-into-signal PASS: attach-sigcont-wait PASS: sa-resethand-on-cont-signal PASS: ptrace_cont-defeats-sigblock PASS: ptrace-cont-sigstop-detach PASS: ptrace_event_clone PASS: tif-syscall-trace-after-detach PASS: event-exit-proc-maps PASS: event-exit-proc-environ SKIP: x86_64-ia32-gs SKIP: x86_64-gsbase PASS: powerpc-altivec PASS: peekpokeusr PASS: watchpoint PASS: block-step PASS: step-jump-cont SKIP: step-jump-cont-strict PASS: ppc-dabr-race PASS: signal-loss PASS: step-into-handler SKIP: user-area-access PASS: user-regs-peekpoke PASS: erestartsys SKIP: erestart-debugger SKIP: step-to-breakpoint errno 14 (Bad address) syscall-reset: syscall-reset.c:95: main: Assertion `(*__errno_location ()) == 38' failed. unexpected child status 67f FAIL: syscall-reset PASS: reparent-zombie PASS: step-simple SKIP: step-through-sigret PASS: stop-attach-then-wait PASS: detach-stopped PASS: detach-stopped-rhel5 PASS: clone-multi-ptrace PASS: clone-ptrace PASS: o_tracevfork PASS: o_tracevforkdone PASS: detach-parting-signal PASS: detach-sigkill-race PASS: waitpid-double-report PASS: o_tracevfork-parent PASS: stopped-detach-sleeping PASS: stopped-attach-transparency SKIP: erestartsys-trap SKIP: highmem-debugger PASS: sigint-before-syscall-exit SKIP: syscall-from-clone SKIP: step-from-clone step-fork: step-fork.c:56: handler_fail: Assertion `0' failed. /bin/sh: line 5: 24803 Aborted ${dir}$tst FAIL: step-fork 2 of 40 tests failed (11 tests were not run) Please report to utrace-devel@redhat.com make[3]: *** [check-TESTS] Error 1 make[3]: Leaving directory `/home/ananth/ptrace-tests/tests' make[2]: *** [check-am] Error 2 make[2]: Leaving directory `/home/ananth/ptrace-tests/tests' make[1]: *** [check] Error 2 make[1]: Leaving directory `/home/ananth/ptrace-tests/tests' make: *** [check-recursive] Error 1
GDB Testsuite Results on POWERPC
Hi, Here is the summary of GDB testsuite runs on a vanilla kernel and one with ptrace over utrace on a powerpc machine: Vanilla ptrace: === gdb Summary === # of expected passes13970 # of unexpected failures52 # of unexpected successes 2 # of expected failures 40 # of untested testcases 8 # of unresolved testcases 125 # of unsupported tests 55 runtest completed at Wed Nov 25 14:02:52 2009 Ptrace over utrace: === gdb Summary === # of expected passes13970 # of unexpected failures52 # of unexpected successes 2 # of expected failures 40 # of untested testcases 8 # of unresolved testcases 125 # of unsupported tests 55 runtest completed at Wed Nov 25 14:21:52 2009 Essentially, there is *no* change in any of the numbers with and without ptrace over utrace. Regards, Ananth
Re: [RFC,PATCH 0/14] utrace/ptrace
On Wed, Nov 25, 2009 at 04:40:52PM +0100, Oleg Nesterov wrote: On 11/25, Ananth N Mavinakayanahalli wrote: I ran the ptrace-tests testsuite [1] on powerpc on the vanilla ptrace and then with ptrace/utrace. The results for ptrace/utrace look better :-) Great! thanks a lot Ananth for doing this. ptrace-utrace still fails 2 tests, FAIL: syscall-reset I'll take a look later. Since unpatched kernel fails this test too I am not going to worry right now. I think this is ppc specific, x86 passes this test. step-fork: step-fork.c:56: handler_fail: Assertion `0' failed. /bin/sh: line 5: 24803 Aborted ${dir}$tst FAIL: step-fork This is expected. Should be fixed by ptrace-copy_process-should-disable-stepping.patch in -mm tree. (I am attaching this patch below just in case) I din't mention this patch in this series because this bug is ortogonal to utrace/ptrace. Oleg, The patch doesn't seem to fix the issue on powerpc: step-fork: step-fork.c:56: handler_fail: Assertion `0' failed. /bin/sh: line 5: 17325 Aborted ${dir}$tst FAIL: step-fork Ananth
Re: [PATCH 1-13] utrace-ptrace V1, for internal review
On Tue, Nov 24, 2009 at 04:26:57PM +0100, Oleg Nesterov wrote: On 11/24, Ananth N Mavinakayanahalli wrote: ... step-jump-cont: step-jump-cont.c:140: pokeuser: Assertion `l == 0' failed. /bin/sh: line 4: 9070 Aborted ${dir}$tst FAIL: step-jump-cont errno 14 (Bad address) syscall-reset: syscall-reset.c:95: main: Assertion `(*__errno_location ()) == 38' failed. unexpected child status 67f FAIL: syscall-reset Ananth, thanks a lot. Hi Oleg, Could you please verify that unpatched kernel passes these tests? There is one additional failure [1] on an unpatched kernel -- so the ptrace/utrace is better :-) Also, could you run make xcheck to do more testing? The output I provided was for a 'make check xcheck' run. Ananth [1] step-jump-cont: step-jump-cont.c:140: pokeuser: Assertion `l == 0' failed. /bin/sh: line 4: 16407 Aborted ${dir}$tst FAIL: step-jump-cont ... errno 14 (Bad address) syscall-reset: syscall-reset.c:95: main: Assertion `(*__errno_location ()) == 38' failed. unexpected child status 67f FAIL: syscall-reset ... FAIL: detach-stopped ... 3 of 27 tests failed (11 tests were not run) Please report to utrace-devel@redhat.com make[3]: *** [check-TESTS] Error 1 make[3]: Leaving directory `/home/ananth/utrace/ptrace-tests/tests' make[2]: *** [check-am] Error 2 make[2]: Leaving directory `/home/ananth/utrace/ptrace-tests/tests' make[1]: *** [check] Error 2 make[1]: Leaving directory `/home/ananth/utrace/ptrace-tests/tests' make: *** [check-recursive] Error 1
Re: PTRACE_EVENT_SIGTRAP
On Thu, Oct 29, 2009 at 08:41:58PM +0100, Oleg Nesterov wrote: On 10/27, Roland McGrath wrote: @@ -495,8 +494,8 @@ static u32 ptrace_report_signal(u32 acti context-siginfo = NULL; if (resume != UTRACE_RESUME) { - set_stop_code(context, PTRACE_EVENT_SIGTRAP); - return UTRACE_STOP | UTRACE_SIGNAL_IGN; + info-si_signo = SIGTRAO; Surely you meant SIGTRAP here, right? :-)
Re: utrace-indirect branch
On Wed, Oct 28, 2009 at 06:55:32PM -0700, Roland McGrath wrote: Please take a look at the patch below and tell me what you think. This is the new(ish) utrace-indirect branch (not to be confused with what's now old/utrace-indirect). I first made it a while ago, but I don't recall if I ever mentioned it. This compiles and looks right, but I have not done any testing at all. Unlike the old code that freaked upstream reviewers all the way out, this has very simple lifetime rules for struct utrace. Once allocated, it lives until task_struct is freed. The utrace_task_alloc() logic covers the only race (at first attach), and that seems pretty easy to understand and be confident in. First glance, this looks much simpler and easier to understand compared to the earlier rcu implementation. ... + if (!utrace) { + if (unlikely(!utrace_task_alloc(target))) + return ERR_PTR(-ENOMEM); + utrace = target-utrace; utrace = task_utrace_struct(target); + } + engine = kmem_cache_alloc(utrace_engine_cachep, GFP_KERNEL); if (unlikely(!engine)) return ERR_PTR(-ENOMEM); Ananth
[PATCH] (trivial) Use set_stop_code() in ptrace_report_signal()
Trivial patch - looks like this was opencoded rather than using the helper Use set_stop_code() wherever possible. Signed-off-by: Ananth N Mavinakayanahalli ana...@in.ibm.com --- kernel/ptrace.c |3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) Index: ptrace-13oct/kernel/ptrace.c === --- ptrace-13oct.orig/kernel/ptrace.c +++ ptrace-13oct/kernel/ptrace.c @@ -331,8 +331,7 @@ static u32 ptrace_report_signal(u32 acti context-siginfo = NULL; if (resume != UTRACE_RESUME) { - context-stop_code = (PTRACE_EVENT_SIGTRAP 8) | SIGTRAP; - + set_stop_code(context, PTRACE_EVENT_SIGTRAP); return UTRACE_STOP | UTRACE_SIGNAL_IGN; }
[PATCH] (cosmetic) rename ptrace_context()
Rename ptrace_context() to get_ptrace_context() and use it for better code readability. (I know it doesn't take a ref or anything; feel free to ignore) Signed-off-by: Ananth N Mavinakayanahalli ana...@in.ibm.com --- kernel/ptrace.c | 40 +--- 1 file changed, 21 insertions(+), 19 deletions(-) Index: ptrace-13oct/kernel/ptrace.c === --- ptrace-13oct.orig/kernel/ptrace.c +++ ptrace-13oct/kernel/ptrace.c @@ -59,7 +59,7 @@ static inline void set_stop_code(struct } static inline struct ptrace_context * -ptrace_context(struct utrace_engine *engine) +get_ptrace_context(struct utrace_engine *engine) { return engine-data; } @@ -124,7 +124,7 @@ static u32 ptrace_report_exit(enum utrac struct task_struct *task, long orig_code, long *code) { - struct ptrace_context *context = ptrace_context(engine); + struct ptrace_context *context = get_ptrace_context(engine); WARN_ON(ptrace_event_pending(context)); @@ -171,7 +171,7 @@ static u32 ptrace_report_clone(enum utra unsigned long clone_flags, struct task_struct *child) { - struct ptrace_context *context = ptrace_context(engine); + struct ptrace_context *context = get_ptrace_context(engine); int event, ret = UTRACE_RESUME; WARN_ON(ptrace_event_pending(context)); @@ -227,7 +227,7 @@ static u32 ptrace_report_syscall_entry(u struct task_struct *task, struct pt_regs *regs) { - struct ptrace_context *context = ptrace_context(engine); + struct ptrace_context *context = get_ptrace_context(engine); WARN_ON(ptrace_event_pending(context)); @@ -241,7 +241,7 @@ static u32 ptrace_report_syscall_exit(en struct task_struct *task, struct pt_regs *regs) { - struct ptrace_context *context = ptrace_context(engine); + struct ptrace_context *context = get_ptrace_context(engine); if (ptrace_event_pending(context)) return UTRACE_STOP; @@ -258,7 +258,7 @@ static u32 ptrace_report_exec(enum utrac const struct linux_binprm *bprm, struct pt_regs *regs) { - struct ptrace_context *context = ptrace_context(engine); + struct ptrace_context *context = get_ptrace_context(engine); WARN_ON(ptrace_event_pending(context)); @@ -316,7 +316,7 @@ static u32 ptrace_report_signal(u32 acti const struct k_sigaction *orig_ka, struct k_sigaction *return_ka) { - struct ptrace_context *context = ptrace_context(engine); + struct ptrace_context *context = get_ptrace_context(engine); enum utrace_resume_action resume = context-resume; if (ptrace_event_pending(context)) { @@ -367,7 +367,7 @@ static u32 ptrace_report_quiesce(u32 act struct task_struct *task, unsigned long event) { - struct ptrace_context *context = ptrace_context(engine); + struct ptrace_context *context = get_ptrace_context(engine); if (ptrace_event_pending(context)) return UTRACE_STOP; @@ -395,7 +395,7 @@ static inline int __ptrace_set_options(s struct utrace_engine *engine, unsigned long options) { - struct ptrace_context *context = ptrace_context(engine); + struct ptrace_context *context = get_ptrace_context(engine); /* * We need QUIESCE for resume handling, CLONE to check * for CLONE_PTRACE, other events are always reported. @@ -867,7 +867,7 @@ void ptrace_notify_stop(struct task_stru return; } - context = ptrace_context(engine); + context = get_ptrace_context(engine); if (WARN_ON(!ptrace_event_pending(context))) return; do_ptrace_notify_stop(context, tracee); @@ -896,7 +896,7 @@ static void do_ptrace_resume(struct utra enum utrace_resume_action action, long request, long data) { - struct ptrace_context *context = ptrace_context(engine); + struct ptrace_context *context = get_ptrace_context(engine); switch (get_stop_event(context)) { case 0: @@ -994,12 +994,14 @@ int ptrace_request(struct task_struct *c long addr, long data) { struct utrace_engine *engine = ptrace_lookup_engine(child); + struct ptrace_context *context; siginfo_t siginfo; int ret; if (unlikely(IS_ERR(engine))) return -ESRCH; + context
Re: Questions about utrace
On Sat, Sep 05, 2009 at 06:01:58PM +0300, Ali Polatel wrote: Hey everyone, I've been writing a ptrace based sandboxing tool, called sydbox¹, and I want to explain about some of my bad experiences with ptrace and whether utrace will fix these deficiencies. First of all ptrace() makes it rather hard writing portable code that will work for every architecture. You have to find out registry numbers for every architecture. It would be really nice if there were a common interface like utrace_get_syscall(), utrace_set_syscall() that works for every architecture supported. Have you looked at using syscall_get_(nr/error/arguments/return_value) et al in arch/arch/include/asm/syscall.h Basically sydbox intercepts some system calls and checks their arguments. Using ptrace, however, it's not possible to stop the children only at the entry of system calls we are interested in but we have to stop them at _every_ system call and check for the system call number. Because of this threaded applications run very slowly under sydbox as we have to stop them at every sched_yield(). I know that utrace will split PTRACE_SYSCALL into two calls SYSCALL_ENTRY and SYSCALL_EXIT and this is really cool but maybe the ability to stop the children only at system calls the caller is interested in is a better idea. Utrace engine callbacks happen in the traced task context and as such, are extremely fast. For the case you have, it isn't difficult for your ops-report_syscall_entry to either do a UTRACE_STOP/UTRACE_RESUME depending on what syscall_get_nr() returned. Aside, have you looked at seccomp? The utrace/seccomp branch on the utrace git tree has some work Roland had done earlier to make seccomp use utrace. Ananth
Re: [RFC, PATCH] teach utrace to destroy engine-data
On Fri, Sep 04, 2009 at 02:55:29AM -0700, Roland McGrath wrote: Simple answer. Because I do not know how to implement this. At least now. I tried to think of this more, but I don't see how to make the first steps. (Yes, to be honest, this looks like unnecessary complication to me, I have to admit. But this is not the reason.) Well, I am befuddled. I explained the model and it seems obvious to me how you'd go about it, the only nontrivial corner being coping with the MT exec reap case. But I won't let our mutual befuddlement stand in the way of making immediate progress the way you like best. Agreed, but nobody else cares ;) Ananth, Srikar, Jim, Frank? (Beuller?) We do have some people around here with some experience dealing with the API. They made me add the engine ref counts in the first place, for Pete's sake. They've got to have some opinions! I am not sure if all of what I say here makes any sense as I am still in the midst of understanding the discussion threads between the two of you (admittedly, its one hairy piece of the kernel). However, here goes: I prefer Roland's ops-release approach. However, is there any chance that -ops is no longer valid at the time __utrace_release_engine() is called? For the ptrace case at least, it'll always be valid, so there are no issues. The concept of how utrace engines use refcnts initially befuddled me. The entity that does a utrace_attach_*, without it having to explicitly ask for a utrace_engine_get(), already owns a reference. That was counter intuitive. (That's the reason for a series of Srikar's utrace_engine_put() fixes too, I guess). That apart, the problem of freeing -data is similar to what Prasad is trying to grapple in hardware watchpoints and its ptrace interaction. I guess its a problem unique to ptrace(?). The issue there, is that when ptrace did a register_user_hardware_watchpoint(), and that thread later died/exec'd, ptrace may (will?) not actually unregister the watchpoint, and the hwbkpt layer is left to fend for itself in cleaning up the struct hw_breakpoint in flush_thread_hardware_watchpoint(). Now, you could have non-ptrace users of the interface (who may have passed just a reference to a statically allocated struct hw_breakpoint) in which case, freeing the structure in the hwbkpt layer is just wrong. Right now, the way Prasad has fixed it is to have a check if the bp-triggered == ptrace_triggered, in which case, free up the structure. Yes, there is a kind of layering violation here -- hwbkpt now knows about ptrace_triggered. But I am not sure if there is a better way of fixing it. Ananth
ptrace/utrace branch/tree?
Oleg, From conversations between you and Roland, it looks like you're reworking quite a bit of the ptrace/utrace code. Wondering if you are hosting your changes in a git tree/branch? Regards, Ananth
[RFC][PATCH 0/3] Utrace based non-disruptive application core dumps - V1
This series is an RFC for utrace-based non-disruptive application core dumper. Per Roland McGrath, this is possibly one of the common use-cases for utrace. This is the first foray of mine into the hairy core dump land. Admittedly, I may have missed some (not so) subtle issues that need careful consideration. The current implementation uses a /proc trigger. Every process gains a new /proc/pid/gen_core 'write-only' file. Echoing a value of '1' will cause the core-dump to be generated. As with fatal core-dumps, the core file will be saved in the same directory as the location from where the application was started. The core dump has a format core.pid.timestamp, where the timestamp is the value of ktime_get() at the time of calling the binfmt-core_dump(). The current implementation uses a completion(). However, this precludes the possibility of a process syncrhonously asking for its own dump (trying this usecase causes the application to hang indefinitely). I am told this is a much needed feature that Linux lacks. Awaiting comments... Regards, Ananth
[PATCH 1/3] Create the /proc trigger
Create the /proc/pid/gen-core file that is the trigger for non-disruptive application core dumps. Writing any value other than 1 to this file is invalid. Signed-off-by: Ananth N Mavinakayanahalli ana...@in.ibm.com --- fs/proc/Makefile |1 fs/proc/base.c |3 ++ fs/proc/internal.h |3 ++ fs/proc/proc_gencore.c | 65 + 4 files changed, 72 insertions(+) Index: utrace-13jul/fs/proc/Makefile === --- utrace-13jul.orig/fs/proc/Makefile +++ utrace-13jul/fs/proc/Makefile @@ -26,3 +26,4 @@ proc-$(CONFIG_PROC_VMCORE)+= vmcore.o proc-$(CONFIG_PROC_DEVICETREE) += proc_devtree.o proc-$(CONFIG_PRINTK) += kmsg.o proc-$(CONFIG_PROC_PAGE_MONITOR) += page.o +proc-$(CONFIG_UTRACE) += proc_gencore.o Index: utrace-13jul/fs/proc/base.c === --- utrace-13jul.orig/fs/proc/base.c +++ utrace-13jul/fs/proc/base.c @@ -2558,6 +2558,9 @@ static const struct pid_entry tgid_base_ #if defined(USE_ELF_CORE_DUMP) defined(CONFIG_ELF_CORE) REG(coredump_filter, S_IRUGO|S_IWUSR, proc_coredump_filter_operations), #endif +#ifdef CONFIG_UTRACE + REG(gen_core, S_IWUSR, proc_gen_core_operations), +#endif #ifdef CONFIG_TASK_IO_ACCOUNTING INF(io, S_IRUGO, proc_tgid_io_accounting), #endif Index: utrace-13jul/fs/proc/internal.h === --- utrace-13jul.orig/fs/proc/internal.h +++ utrace-13jul/fs/proc/internal.h @@ -60,6 +60,9 @@ extern const struct file_operations proc extern const struct file_operations proc_pagemap_operations; extern const struct file_operations proc_net_operations; extern const struct inode_operations proc_net_inode_operations; +#ifdef CONFIG_UTRACE +extern const struct file_operations proc_gen_core_operations; +#endif /* CONFIG_UTRACE */ void free_proc_entry(struct proc_dir_entry *de); Index: utrace-13jul/fs/proc/proc_gencore.c === --- /dev/null +++ utrace-13jul/fs/proc/proc_gencore.c @@ -0,0 +1,65 @@ +/* + * Non-disruptive application core dump + * + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License as published by + * the Free Software Foundation; either version 2 of the License, or + * (at your option) any later version. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + * + * You should have received a copy of the GNU General Public License + * along with this program; if not, write to the Free Software + * Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA. + * + * Copyright (C) IBM Corporation, 2009 + */ + +#include linux/seq_file.h +#include linux/utrace.h +#include internal.h + +#include asm/uaccess.h + +static ssize_t gen_core_write(struct file *file, const char __user *buf, + size_t count, loff_t *ppos) +{ + struct task_struct *task; + char buffer[PROC_NUMBUF], *end; + unsigned int val; + int ret; + + ret = -EFAULT; + memset(buffer, 0, sizeof(buffer)); + if (count sizeof(buffer) - 1) + count = sizeof(buffer) - 1; + if (copy_from_user(buffer, buf, count)) + goto out_no_task; + + ret = -EINVAL; + val = (unsigned int)simple_strtoul(buffer, end, 0); + if (*end == '\n') + end++; + if ((end - buffer == 0) || (val == 0) || (val 1)) + goto out_no_task; + + ret = -ESRCH; + task = get_proc_task(file-f_dentry-d_inode); + if (!task) + goto out_no_task; + + ret = end - buffer; + + /* TODO: call core dumper */ + + put_task_struct(task); +out_no_task: + return ret; +} + +const struct file_operations proc_gen_core_operations = { + .write = gen_core_write, +};
[PATCH] Clarify UTRACE_ATTACH_EXCLUSIVE a bit more
More than one user has hit the -EEXIST problem when using utrace_attach_task and UTRACE_ATTACH_EXCLUSIVE without UTRACE_ATTACH_MATCH_DATA|_OPS. Document that a bit more. Signed-off-by: Ananth N Mavinakayanahalli ana...@in.ibm.com --- kernel/utrace.c |4 1 file changed, 4 insertions(+) Index: utrace-28apr/kernel/utrace.c === --- utrace-28apr.orig/kernel/utrace.c +++ utrace-28apr/kernel/utrace.c @@ -214,6 +214,10 @@ static int utrace_add_engine(struct task * * UTRACE_ATTACH_MATCH_OPS: Only consider engines matching @ops. * UTRACE_ATTACH_MATCH_DATA: Only consider engines matching @data. + * + * Using UTRACE_ATTACH_EXCLUSIVE without either of UTRACE_ATTACH_MATCH_OPS + * or UTRACE_ATTACH_MATCH_DATA will result in a match for any existing + * engine for the task, causing an -%EEXIST return. */ struct utrace_engine *utrace_attach_task( struct task_struct *target, int flags,
Re: [PATCH 3/3] utrace-based ftrace process engine, v2
On Mon, Mar 23, 2009 at 10:54:09PM -0700, Andrew Morton wrote: On Tue, 24 Mar 2009 10:59:26 +0530 Ananth N Mavinakayanahalli ana...@in.ibm.com wrote: On Sat, Mar 21, 2009 at 05:04:22AM -0700, Andrew Morton wrote: On Sat, 21 Mar 2009 07:51:41 -0400 Frank Ch. Eigler f...@redhat.com wrote: On Sat, Mar 21, 2009 at 04:19:54AM -0700, Andrew Morton wrote: I have strong memories of being traumatised by reading the uprobes code. That was a long time ago wasn't it? :-) That approach was a carry over from an implementation from dprobes that used readdir hooks. Yes, that was not the most elegant approach, as such has long been shelved. What's the story on all of that nowadays? Utrace makes implementing uprobes more cleaner. We have a prototype that implements uprobes over utrace. Its per process, doesn't use any in-kernel hooks, etc. It currently has a kprobes like interface (needs a kernel module), but it shouldn't be difficult to adapt it to use utrace's user interfaces (syscalls?) when those come around. The current generation of uprobes that has all the bells and whistles can be found at http://sources.redhat.com/git/gitweb.cgi?p=systemtap.git;a=tree;f=runtime/uprobes2 However, there are aspects of the current uprobes that can be useful to any other userspace tracer: instruction analysis, breakpoint insertion and removal, single-stepping support. With these layered on top of utrace, building userspace debug/trace tools that depend on utrace should be easier, outside of ptrace. Work is currently on to factor these layers out. The intention is to upstream all the bits required for userspace tracing once utrace gets in, along with an easy to use interface for userspace developers (a /proc or /debugfs interface?) -- one should be able to use it on its own or with SystemTap, whatever they prefer. Details are still hazy at the moment. But, utrace is the foundation to do all of that. The sticking point was uprobes's modification of live pagecache. We said ick, COW the pages and you said too expensive. And there things remained. Is that all going to happen again? No. All modifications are via access_process_vm(). Ananth
Re: [PATCH 3/3] utrace-based ftrace process engine, v2
On Sat, Mar 21, 2009 at 05:04:22AM -0700, Andrew Morton wrote: On Sat, 21 Mar 2009 07:51:41 -0400 Frank Ch. Eigler f...@redhat.com wrote: On Sat, Mar 21, 2009 at 04:19:54AM -0700, Andrew Morton wrote: I have strong memories of being traumatised by reading the uprobes code. That was a long time ago wasn't it? :-) That approach was a carry over from an implementation from dprobes that used readdir hooks. Yes, that was not the most elegant approach, as such has long been shelved. What's the story on all of that nowadays? Utrace makes implementing uprobes more cleaner. We have a prototype that implements uprobes over utrace. Its per process, doesn't use any in-kernel hooks, etc. It currently has a kprobes like interface (needs a kernel module), but it shouldn't be difficult to adapt it to use utrace's user interfaces (syscalls?) when those come around. The current generation of uprobes that has all the bells and whistles can be found at http://sources.redhat.com/git/gitweb.cgi?p=systemtap.git;a=tree;f=runtime/uprobes2 However, there are aspects of the current uprobes that can be useful to any other userspace tracer: instruction analysis, breakpoint insertion and removal, single-stepping support. With these layered on top of utrace, building userspace debug/trace tools that depend on utrace should be easier, outside of ptrace. Work is currently on to factor these layers out. The intention is to upstream all the bits required for userspace tracing once utrace gets in, along with an easy to use interface for userspace developers (a /proc or /debugfs interface?) -- one should be able to use it on its own or with SystemTap, whatever they prefer. Details are still hazy at the moment. But, utrace is the foundation to do all of that. Ananth
[BUG] utrace_attach_task() never returns when called from the report_clone callback
Roland, With the current utrace/master tree, I am seeing that utrace_attach_task() never returns when invoked from the clone callback. The same module works fine with prior utrace (rcu as well as with my embed version). The testcase is simple: a. attach an engine to attachstop-mt (from the gdb testsuite) _before_ it calls pthread_create. b. Watch for CLONE_THREAD and try to attach a utrace engine to the new thread. The utrace_attach_task() call never returns. If the utrace module is unloaded, the kernel barfs with the following innocuous information: BUG: unable to handle kernel paging request at fdff IP: [a012009a] 0xa012009a PGD 203067 PUD 204067 PMD 0 Oops: 0002 [#1] SMP last sysfs file: /sys/devices/pci:01/:01:01.1/irq CPU 6 Modules linked in: big list [last unloaded: utrace_quiesce_threads] Pid: 6203, comm: attachstop-mt Not tainted 2.6.29-rc7-ut #1 eserver xSeries 366-[88632RA]- RIP: 0010:[a012009a] [a012009a] 0xa012009a RSP: 0018:8801d34ebe10 EFLAGS: 00010246 RAX: fdff RBX: 8801f11a36c0 RCX: c100 RDX: RSI: 8801dd0507f8 RDI: 88022daf4500 RBP: fff4 R08: 8801d34ea000 R09: 88022f2596a0 R10: 8800280b1600 R11: 0018 R12: 8801d34f1860 R13: 8802210dd300 R14: 8801dd07e2c0 R15: 003d0f00 FS: 7f58c8d286e0() GS:88022f18e5c0() knlGS: CS: 0010 DS: ES: CR0: 8005003b CR2: fdff CR3: 0002029bd000 CR4: 06e0 DR0: DR1: DR2: DR3: DR6: 0ff0 DR7: 0400 Process attachstop-mt (pid: 6203, threadinfo 8801d34ea000, task 8801d3512440) Stack: 003d0f00 8801d34f1860 8802210dd300 8801d3512440 8801d34ebe70 a012028d 8801dd050618 8801d35129e0 8801d35129d8 80260480 8801d34f1860 Call Trace: [80260480] ? utrace_report_clone+0x95/0xfc [80239120] ? do_fork+0x20b/0x2f3 [804a4035] ? do_page_fault+0x3c7/0x74e [8020c243] ? stub_clone+0x13/0x20 [8020bedb] ? system_call_fastpath+0x16/0x1b Code: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 RIP [a012009a] 0xa012009a RSP 8801d34ebe10 CR2: fdff ---[ end trace 96bb7eb644ab73a4 ]--- I have verified that the earlier version of utrace works just fine. In the earlier case, the engine would go directly on to the attached list if the calling task was the creator of the new thread. This has changed with the new implementation. I haven't yet zeroed in on what exact change caused this problem. Ananth
Re: [BUG] utrace_attach_task() never returns when called from the report_clone callback
On Fri, Mar 06, 2009 at 12:52:34PM -0800, Roland McGrath wrote: With the current utrace/master tree, I am seeing that utrace_attach_task() never returns when invoked from the clone callback. The same module works fine with prior utrace (rcu as well as with my embed version). I changed the utrace_attach_delay() logic recently. That is probably it. Right, reverting dd30e86355e fixes the problem. Please post your test case. Here it is -- does nothing much really :) I used this module in conjunction with attachstop_mt with an engine attaching to it before the pthread_create(). --- #include linux/module.h #include linux/utrace.h #include linux/err.h MODULE_DESCRIPTION(Utrace tests); MODULE_LICENSE(GPL); static int target_pid; module_param_named(pid, target_pid, int, 0); /* Structure for all threads of a process having the same utrace ops */ struct proc_utrace { struct task_struct *tgid_task; /* list of task_utrace structs */ struct list_head list; unsigned int num_threads; }; struct task_utrace { struct list_head list; struct task_struct *task; /* TODO: Get rid of this and use MATCHING_OPS on task? */ struct utrace_engine *engine; }; static const struct utrace_engine_ops ut_ops; static struct task_utrace *get_task_ut(struct task_struct *task, struct proc_utrace *proc_ut) { struct task_utrace *task_ut, *temp; list_for_each_entry_safe(task_ut, temp, proc_ut-list, list) { if (task_ut-task == task) return task_ut; } return NULL; } static int cleanup_proc_ut(struct proc_utrace *proc_ut) { int ret = 0; struct task_utrace *task_ut, *temp; printk(KERN_INFO Cleanup_proc_ut\n); if (proc_ut == NULL) return 0; if (list_empty(proc_ut-list)) goto out; /* walk proc_ut-list and free task_ut */ list_for_each_entry_safe(task_ut, temp, proc_ut-list, list) { if (task_ut-engine) { printk(KERN_INFO Calling detach for %d\n, task_pid_nr(task_ut-task)); ret = utrace_control(task_ut-task, task_ut-engine, UTRACE_DETACH); if (ret) printk(KERN_INFO utrace_detach returned %d\n, ret); printk(KERN_INFO Detached engine for %d\n, task_pid_nr(task_ut-task)); } list_del(task_ut-list); kfree(task_ut); } out: kfree(proc_ut); return ret; } static int setup_task_ut(struct task_struct *t, struct proc_utrace *proc_ut) { struct task_utrace *task_ut; int ret = 0; if (!t || !proc_ut) return -EINVAL; printk(KERN_INFO setup_task_ut: attaching for task %d\n, task_pid_nr(t)); task_ut = kzalloc(sizeof(*task_ut), GFP_KERNEL); if (!task_ut) return -ENOMEM; INIT_LIST_HEAD(task_ut-list); task_ut-task = t; list_add_tail(task_ut-list, proc_ut-list); /* * The utrace engine's *data will point to proc_ut. */ printk(KERN_INFO Before utrace_attach_task: %d\n, task_pid_nr(t)); task_ut-engine = utrace_attach_task(t, UTRACE_ATTACH_CREATE, ut_ops, proc_ut); printk(KERN_INFO After utrace_attach_task: %d, engine = %p\n, task_pid_nr(t), task_ut-engine); if (IS_ERR(task_ut-engine)) { printk(KERN_ERR utrace_attach_task returned %d\n, (int)PTR_ERR(task_ut-engine)); task_ut-engine = NULL; ret = -ESRCH; goto out; } printk(KERN_INFO utrace_attach_task: SUCCESS! - engine = %p\n, task_ut-engine); if (utrace_set_events(t, task_ut-engine, UTRACE_EVENT(QUIESCE) | UTRACE_EVENT(CLONE) | UTRACE_EVENT(EXIT))) { ret = -ESRCH; } proc_ut-num_threads++; out: return ret; } static u32 ut_quiesce(enum utrace_resume_action action, struct utrace_engine *engine, struct task_struct *task, unsigned long event) { printk(KERN_INFO In quiesce callback: tid = %d\n, task_pid_nr(task)); return UTRACE_RESUME; } /* clone handler -- handle thread spawns and forks */ static u32 ut_clone(enum utrace_resume_action action, struct utrace_engine *engine, struct task_struct *parent, unsigned long clone_flags, struct task_struct *child) { struct proc_utrace *proc_ut = (struct proc_utrace *)engine-data;
Re: [BUG] utrace_attach_task() never returns when called from the report_clone callback
On Fri, Mar 06, 2009 at 12:52:34PM -0800, Roland McGrath wrote: With the current utrace/master tree, I am seeing that utrace_attach_task() never returns when invoked from the clone callback. The same module works fine with prior utrace (rcu as well as with my embed version). I changed the utrace_attach_delay() logic recently. That is probably it. Please post your test case. The issue is that target-real_parent == current-real_parent and not current on a CLONE_THREAD|CLONE_PARENT. So we keep looping in the do-while. Ananth
Re: [PATCH] Embed struct utrace in task_struct - V2
On Wed, Jan 21, 2009 at 11:58:25AM +0530, Ananth N Mavinakayanahalli wrote: On Mon, Jan 19, 2009 at 03:20:31PM -0800, Roland McGrath wrote: Thanks for working on this, Ananth. (Btw, it's embed.) I think it would be less disruptive (and materially no different) to leave utrace_flags as it is. That field is the one (and only) that is used in hot paths (or used anywhere outside utrace.c). It might in future get moved around to stay in a cache-hot part of task_struct, for example. The long comment above struct utrace is really all about implementation details inside utrace.c and I don't think you should move that commentary to the header file. Instead, put a comment saying that the contents of struct utrace and their use is entirely private to kernel/utrace.c and it is only defined in the header to make its size known for struct task_struct layout (and init_task.h). I committed some cosmetic changes that will make for a little less flutter in your patch. Here is V2 of the patch. Tested and works fine. Same two tests on powerpc fail, all tests pass on x86, while there are some occurances of the ptrace.c WARN_ON. Roland, I've tried to tweak the comments appropriately. Please feel free to modify them as you consider fit. Roland, Any updates on this and the utrace upstream integration front? Ananth
Utrace in -next tree?
Roland, What are your thoughts of getting utrace git tree into linux-next? That way, utrace will have more extensive visibility and testing. Ananth
[PATCH] Fix spin_unlock order in utrace_stop
From: Ananth N Mavinakayanahalli [EMAIL PROTECTED] utrace_stop() seems to get the spin_unlock sequence inverted in one of the unlikely branches. Fix it. Signed-off-by: Ananth N Mavinakayanahalli [EMAIL PROTECTED] --- kernel/utrace.c |2 +- 1 file changed, 1 insertion(+), 1 deletion(-) Index: utrace-2sep/kernel/utrace.c === --- utrace-2sep.orig/kernel/utrace.c +++ utrace-2sep/kernel/utrace.c @@ -482,8 +482,8 @@ static bool utrace_stop(struct task_stru spin_lock_irq(task-sighand-siglock); if (unlikely(sigismember(task-pending.signal, SIGKILL))) { - spin_unlock(utrace-lock); spin_unlock_irq(task-sighand-siglock); + spin_unlock(utrace-lock); return true; }
BUG: Sleeping function called from invalid context...
Roland, Occasionally, I see: attached to 1717 = 0xde089000 utrace_control: -EINPROGRESS pid 1717 reports quiesced to 0xde089000 BUG: sleeping function called from invalid context at kernel/sched.c:5428 in_atomic():0, irqs_disabled():1 no locks held by uttest/1717. irq event stamp: 1396 hardirqs last enabled at (1395): [c0438309] trace_hardirqs_on+0xb/0xd hardirqs last disabled at (1396): [c04d2648] trace_hardirqs_off_thunk+0xc/0x10 softirqs last enabled at (994): [c0420091] __do_softirq+0x97/0x9e softirqs last disabled at (987): [c04056c5] do_softirq+0x59/0xa8 Pid: 1717, comm: uttest Not tainted 2.6.27-rc3 #1 [c04178dd] __might_sleep+0xb3/0xb8 [c041891f] __cond_resched+0x12/0x3e [c05dbee8] _cond_resched+0x27/0x32 [c04437ef] finish_callback+0x37/0xb4 [c04438b0] start_callback+0x44/0x85 [c0444587] utrace_resume+0xa8/0x106 [c0403315] do_notify_resume+0x630/0x654 [c04375db] ? trace_hardirqs_off+0xb/0xd [c043755d] ? trace_hardirqs_off_caller+0x14/0x87 [c04375db] ? trace_hardirqs_off+0xb/0xd [c040736a] ? native_sched_clock+0x71/0x83 [c0436fd8] ? lock_release_holdtime+0x2b/0x12e [c04a392d] ? dnotify_parent+0x5c/0x61 [c0473288] ? fsnotify_access+0x54/0x5f [c04fc958] ? tty_read+0x0/0xa9 [c0473f4f] ? vfs_read+0xa6/0xdf [c0403ab6] work_notifysig+0x13/0x19 === pid 1717 reports quiesced to 0xde089000 autodetach 0xde089000 for 1717 (by 1717) cannot find PID 1717 The utrace module is a rehased version of test-quiesce.c from the ntrace tarball to work with the new interfaces. The report_quiesce handler just returns UTRACE_RESUME. This is on an x86_32 machine with the latest utrace git and the config has CONFIG_SMP=n and CONFIG_PREEMPT_VOLUNTARY=y. Ananth
Re: BUG: Sleeping function called from invalid context...
On Wed, Aug 20, 2008 at 11:48:00AM -0700, Roland McGrath wrote: Thanks for the report, Ananth. Ah! The i386 will enter do_notify_resume() with interrupts disabled. Other machines don't do this (x86-64 and powerpc64, anyway). It is often harmless, because if TIF_SIGPENDING is set, we'll first enter get_signal_to_deliver() and implicitly reenable interrupts by virtue of spin_lock_irq + spin_unlock_irq. If we don't enter that path, we get straight to utrace_notify_resume() with interrupts still disabled. I've merged the following fix in. Please verify that it works well for you. The patch does fix the problem. Thanks Roland! Ananth
Re: RCU, reference counts
On Tue, Aug 05, 2008 at 07:20:42PM -0700, Roland McGrath wrote: Yes! [...] What is the use case for a utrace client to do a utrace_engine_get/put()? Wouldn't it be more robust if utrace implicitly handles refcounts as you've detailed below? If the only operations that affect this count are implicit, then I assume you must mean those are attach and detach. What today's utrace has now is an implicit count: it's 1 when attached, and 0 when not. It's perfectly robust as described. Since it's implicit, you have to observe all those squirrelly rules about synchronizing with UTRACE_DETACH that I've been talking about. If there is no utrace_engine_put, then what difference are you saying an enthusiastic Yes! to? Explain what you have in mind that would be somehow different from what we already had, but have no utrace_engine_put. My initial opinion was that you were moving away from RCU to also rid the task_struct-utrace assertion failure, which IIRC from some of the investigations at the time, were mostly for RCU lifetime reasons. Ananth
Re: asynchronous detach
On Thu, Jul 31, 2008 at 03:38:02PM -0700, Roland McGrath wrote: ... I think this can be a good model to use for non-perturbing quiesce for cases as breakpoint insertion and removal or any applicaton text modification, where one needs to ensure no thread is executing in the vicinity of the said modification. So, even on an -EINPROGRESS on a utrace_control, if the thread is in kernel, manipulating application text is still safe, right? For this particular case, maybe safe enough. I wouldn't like to encourage thinking of it as safe in general. Whatever the kernel is doing might be touching user memory, or unmapping it or whatever. For breakpoint insertion/removal, you are pretty much assuming nothing else would ever touch the text anyway. You still need an interlock against munmap text + mmap reusing addrs in quick succession. I suppose you could hold mmap_sem or something. Not sure if a similar model can be used to address the teardown races problem. Hmm. I mentioned before having a non-synchronizing record of the callback in progress. This lets UTRACE_DETACH return -EINPROGRESS, which is after the horse has left the barn, as it were. There is a related extension of this idea I had vaguely in mind but didn't mention. A utrace_set_events call that disables event bits could also return -EINPROGRESS when some callback might be in progress. As with detach, that means a zero return tells you all is well. Unlike detach, if you get -EINPROGRESS from utrace_set_events, you still have the engine. So, the intention is that -EINPROGRESS can be returned only on a utrace_set_events(task, engine, 0), right? Actually, I think it would also be useful to just have a utrace_clear_events_sync() that encapsulates the above call. I can see use for it outside of an asynchronous detach, for cases when one needs to just turn tracing events off on a thread. (It is definitely not recommended for debugger/probes usage with breakpoints already set in user code, but would be useful for someone doing non-invasive syscall and process lifetime tracing). If your callback never blocks, that is enough to know that it's safe to free your data, unload your module, etc. (Or if your callback can block but what you want to synchronize with is just what it does before it blocks. That's when you don't care about being able to unload the module, just juggling your own data structures.) For the general case of some unknown callback code that might include blocking, you could instead do: ret = utrace_set_events(task, engine, 0); while (ret == -EINPROGRESS) { ret = utrace_prepare_examine(task, engine, exam); if (ret == -EAGAIN) yield(); ret = utrace_set_events(task, engine, 0); if (ret == -EINPROGRESS) yield(); } That is: when it is blocked, check again if a callback still might be in progress, and loop. Assuming your callback is well-behaved and only ever blocks for a short time (like kmalloc), this will return soon. This can cetainly work! yield(); should actually be schedule_timeout_interruptible(1); probably, and it should be able to return -ERESTARTSYS. If this is a good plan, then we should encapsulate it in a new call utrace_set_events_sync() or something like that. So then the safe detach procedure would be just: ret = utrace_set_events_sync(task, engine, UTRACE_EVENT(REAP)); if (!ret) ret = utrace_control(task, engine, UTRACE_DETACH); This can only return zero or -ESRCH. Zero says that all callbacks are definitely finished. -ESRCH says that your report_reap callback is definitely happening (right now, modulo preemption). We could encapsulate that into utrace_detach_sync() too. Yes, this does take care of the thread about to die case too. Ananth
Re: crash-suspend teardown races
On Wed, Jul 30, 2008 at 04:19:44PM -0500, David Smith wrote: ... @@ -197,14 +224,33 @@ static void __exit exit_crash_suspend(vo error, t-pid); } else { - int ret = utrace_control(t, engine, UTRACE_DETACH); + int ret = utrace_set_events(t, engine, + SHUTDOWN_EVENTS); WARN_ON(ret); + + ret = utrace_control(t, engine, UTRACE_STOP); + if (ret == -EINPROGRESS) { + ret = utrace_control(t, engine, + UTRACE_INTERRUPT); + WARN_ON(ret); + } + else { + WARN_ON(ret); + } ++n; } } rcu_read_unlock(); - printk(detached from %d threads, unloading\n, n); + printk(KERN_INFO requested %d threads to stop\n, n); + + if (wait_event_timeout(crash_suspend_wq, +(atomic_read(attach_count) = 0), +60 * HZ) = 0) { + printk(KERN_ERR gave up waiting.\n); + } + printk(KERN_INFO unloading, %d threads left to stop\n, +atomic_read(attach_count)); } module_init(init_crash_suspend); Thinking aloud, utrace_control(UTRACE_STOP) returns -EINPROGRESS for threads not yet stopped a. possibly still in userspace, yet to pass through a quiesce safe point b. blocked in the kernel on a syscall or an exception. Would task_current_syscall() help here? On a -EINPROGRESS return, one can check for a 0 return from task_current_syscall() which'd mean the thread is in the kernel. Wouldn't that mean that the thread will eventually do a report_quiesce? I think this can be a good model to use for non-perturbing quiesce for cases as breakpoint insertion and removal or any applicaton text modification, where one needs to ensure no thread is executing in the vicinity of the said modification. So, even on an -EINPROGRESS on a utrace_control, if the thread is in kernel, manipulating application text is still safe, right? Not sure if a similar model can be used to address the teardown races problem. Thoughts? Ananth
Re: utrace status
Roland, Here is a minor fix for powerpc syscalls.h. --- From: Ananth N Mavinakayanahalli [EMAIL PROTECTED] Remove return 0 from static inline void syscall_set_arguments() in powerpc/syscalls.h --- include/asm-powerpc/syscall.h |1 - 1 file changed, 1 deletion(-) Index: kernel-utrace-8jul/include/asm-powerpc/syscall.h === --- kernel-utrace-8jul.orig/include/asm-powerpc/syscall.h +++ kernel-utrace-8jul/include/asm-powerpc/syscall.h @@ -67,7 +67,6 @@ static inline void syscall_set_arguments { BUG_ON(i + n 6); memcpy(regs-gpr[3 + i], args, n * sizeof(args[0])); - return 0; } #endif /* _ASM_SYSCALL_H */
Re: utrace status
On Wed, Jul 02, 2008 at 08:02:36PM -0700, Roland McGrath wrote: The status of the ptrace cooperation code looks OK but it is not very well tested. I've really only tested on x86-64 lately. It passed most of the ptrace-tests, but I haven't done anything big like gdb. A simple update to ensure ptrace-utrace co-operation patch builds on powerpc to aid in testing/debug. Srini's task_pt_regs() patch is also required for the build to be successful. Signed-off-by: Ananth N Mavinakayanahalli [EMAIL PROTECTED] --- kernel/ptrace.c | 25 + 1 file changed, 25 insertions(+) Index: kernel-utrace-8jul/kernel/ptrace.c === --- kernel-utrace-8jul.orig/kernel/ptrace.c +++ kernel-utrace-8jul/kernel/ptrace.c @@ -230,10 +230,17 @@ static u32 utrace_ptrace_report(u32 acti task-exit_code = code; +#ifdef CONFIG_X86 printk(XXX %d ptrace_report act %x (%x) code %x ax %ld orig_ax %ld\n, task-pid, action, ptrace_resume_action(task), task-exit_code, task_pt_regs(task)-ax, task_pt_regs(task)-orig_ax); +#elif defined(CONFIG_PPC) + printk(XXX %d ptrace_report act %x (%x) code %x gpr3 %ld orig_gpr3 %ld\n, + task-pid, action, ptrace_resume_action(task), + task-exit_code, + task_pt_regs(task)-gpr[3], task_pt_regs(task)-orig_gpr3); +#endif WARN_ON(1); return action | UTRACE_STOP; @@ -340,8 +347,13 @@ static u32 ptrace_report_syscall(enum ut int code = SIGTRAP; if (task-ptrace PT_TRACESYSGOOD) code |= 0x80; +#ifdef CONFIG_X86 printk(XXX %d syscall code %x ax %ld orig_ax %ld\n, task-pid, task-exit_code, regs-ax, regs-orig_ax); +#elif defined(CONFIG_PPC) + printk(XXX %d syscall code %x gpr3 %ld orig_gpr3 %ld\n, + task-pid, task-exit_code, regs-gpr[3], regs-orig_gpr3); +#endif WARN_ON(1); return utrace_ptrace_report(0, task, code); } @@ -398,9 +410,15 @@ static u32 ptrace_report_signal(u32 acti const struct k_sigaction *orig_ka, struct k_sigaction *return_ka) { +#ifdef CONFIG_X86 printk(XXX %d signal %d act %x (%x) code %x ax %ld orig_ax %ld\n, task-pid, info-si_signo, action, ptrace_resume_action(task), task-exit_code, regs-ax, regs-orig_ax); +#elif defined(CONFIG_PPC) + printk(XXX %d signal %d act %x (%x) code %x gpr3 %ld orig_gpr3 %ld\n, + task-pid, info-si_signo, action, ptrace_resume_action(task), + task-exit_code, regs-gpr[3], regs-orig_gpr3); +#endif WARN_ON(1); if (info-si_signo == 0 || @@ -418,10 +436,17 @@ static u32 ptrace_report_quiesce(u32 act struct task_struct *task, unsigned long event) { +#ifdef CONFIG_X86 printk(XXX %d quiesce %lx act %x (%x) code %x ax %ld orig_ax %ld\n, task-pid, event, action, ptrace_resume_action(task), task-exit_code, task_pt_regs(task)-ax, task_pt_regs(task)-orig_ax); +#elif defined(CONFIG_PPC) + printk(XXX %d quiesce %lx act %x (%x) code %x gpr3 %ld orig_gpr3 %ld\n, + task-pid, event, action, ptrace_resume_action(task), + task-exit_code, + task_pt_regs(task)-gpr[3], task_pt_regs(task)-orig_gpr3); +#endif WARN_ON(1); task-last_siginfo = NULL; return ptrace_resume_action(task);
Re: user_regset is in!
On Wed, Jan 30, 2008 at 12:22:58PM -0800, Roland McGrath wrote: The generic and x86 code for user_regset went into Linus's kernel tree today, destined for the 2.6.25 release. I'm very grateful to Ingo Molnar, who helped this happen via the x86.git tree. I've also had some positive feedback from the powerpc maintainer, and expect the powerpc user_regset code to go in upstream soon as well. Paul Mackerras has pulled in the regset changes. Should be in Linus' tree tomorrow. Ananth
Re: incremental arch work
On Tue, Nov 20, 2007 at 08:29:19PM -0800, Roland McGrath wrote: Here are the steps I have in mind. I think this work should be pretty well clear to merge upstream without much controversy. Basically, this is the arch parts now done in the tracehook and regset patches, with a little sugar. Several of these steps can be done in parallel and merged upstream independently. ... Once upstream arch code has merged all the steps above, there will be no more arch changes--or very nearly none, anyway--required to work with a later merging of utrace or something else like it. I've thought about ways to be more incremental about the core changes, too. But if we can get the ball rolling with the arch changes and get a majority of upstream arch trees converted over, that will be a first big win. Now that code related to most of the steps you outlined in this thread are scheduled to be pushed upstream (thanks a ton for the work!), from a (!ptrace) utrace client point of view, the two major remaining components would be the tracehooks and the engine infrastructure. We have quite a bit of code in the uprobes codebase that would be of interest to the larger utrace-client community. These include: - Breakpoint assistance (including single-stepping out of line) - Quiescing all threads of a process. From a utrace-client (and long term maintenance perspective), we'd like to factor these out. (It'd also make the uprobes codebase much leaner). - What'd be your suggestions on that front? Do we just base these off of the current utrace engine infrastructure? - Are the tracehooks and engines the next targets upstream? Possibly 2.6.26? - Do you have any changes in mind from a utrace-client perspective that we need to be cognizant of? Please advise. Ananth