Re: How can I user @cast in the user-space's program or have other ways to aasign valus to the user-space pointer to a structure.
Liu, This is really a question for the systemtap list, not the utrace list. Forwarding there. Also see possible answer below. On 03/26/2012 10:06 PM, Liu Tianhao wrote: I have a problem to cast a pointer to a structure in the user-space’s program. It always report “ERROR: kernel write fault at 0x00400675 (addr) near identifier '@cast' at test.stp:3:8”. Compile the source file and execute the stap command. liuth@liuthivb:~/$ gcc -g -o test test.c liuth@liuthivb:~/$ sudo stap -w -vg test.stp -c ./test Pass 1: parsed user script and 81 library script(s) using 49344virt/22060res/2024shr kb, in 130usr/0sys/125real ms. Pass 2: analyzed script: 2 probe(s), 9 function(s), 0 embed(s), 0 global(s) using 51992virt/23168res/2540shr kb, in 10usr/0sys/5real ms. Pass 3: using cached /home/liuth/.systemtap/cache/5c/stap_5c288dc4a44724d509924f222aedb626_90 50.c Pass 4: using cached /home/liuth/.systemtap/cache/5c/stap_5c288dc4a44724d509924f222aedb626_90 50.ko Pass 5: starting run. hello world call--call The value of a:[F] The value of b:[10] call--call ERROR: kernel write fault at 0x004005b5 (addr) near identifier '@cast' at test.stp:3:8 Pass 5: run completed in 10usr/0sys/589real ms. Pass 5: run failed. Try again with another '--vp 1' option. I have modified the test.stp as follows. probe process (/home/liuth/worksource/ddtv/tracedrv/java/DDTVConfig/test).function (funcStruct).call { // compilation error // @cast($pStruct, struct TestStruct, test.h )-a = 31 //@cast($pStruct, struct TestStruct, test.h )-b = 32 // ERROR: kernel write fault at 0x004005b5 (addr) near identifier '@cast' at test.stp:3:8 //@cast($pStruct, struct TestStruct, test.h )-a = 31 //@cast($pStruct, struct TestStruct, test.h )-b = 32 // ERROR: kernel read fault at 0x0020001f (addr) near identifier '$pStruct' at test.stp:5:60 //@cast($pStruct, struct TestStruct, test.h )-a = 31 //@cast($pStruct, struct TestStruct, test.h )-b = 32 @cast($pStruct, struct TestStruct)-a = 31 @cast($pStruct, struct TestStruct)-b = 32 printf(The value of a:[%X] The value of b:[%X]\n, $pStruct-a, $pStruct-b) } Hmm, what happens when you just use the pointer directly, like this: $pStruct-a = 31 $pStruct-b = 32 The following are the program and the script. --- --- - Header file test.h: #include stdlib.h #include stdio.h typedef struct TestStruct { int a; int b; }ST_Test_Struct; //int func(int a, int b, int c) int func(ST_Test_Struct tmpStruct); int funcStruct(ST_Test_Struct* pStruct); source file test.c: #include test.h int func(ST_Test_Struct tmpStruct) { return tmpStruct.a + tmpStruct.b; } int funcStruct(ST_Test_Struct* pStruct) { return pStruct-a + pStruct-b; } int main(int argc, char** argv) { ST_Test_Struct tmpStruct = { 1,2 }; func(tmpStruct); funcStruct(tmpStruct); printf(hello world\n); return 0; } script test.stp: probe process (/home/liuth/worksource/ddtv/tracedrv/java/DDTVConfig/test).function (funcStruct).call { @cast($pStruct, struct TestStruct)-a = 31 @cast($pStruct, struct TestStruct)-b = 32 printf(The value of a:[%X] The value of b:[%X]\n, $pStruct-a, $pStruct-b) } probe process (/home/liuth/worksource/ddtv/tracedrv/java/DDTVConfig/test).function (func).call { printf(call--call\n) $tmpStruct-a =15; $tmpStruct-b =16; printf(The value of a:[%X] The value of b:[%X]\n, $tmpStruct-a, $tmpStruct-b) printf(call--call\n) } -- David Smith dsm...@redhat.com Red Hat http://www.redhat.com 256.217.0141 (direct) 256.837.0057 (fax)
Re: [PATCH 1/4] ptrace: temporary revert the recent ptrace/jobctl rework
On 06/21/2011 10:25 AM, Oleg Nesterov wrote: OK, I won't argue. So we need to rework utrace/ptrace in 3.0, then we should do this again in 3.1. I'll try to do something. I have a thought here, I'm not familiar enough with utrace internals to know whether it is a good one or not. Originally, implementing ptrace via utrace was optional - if I remember correctly there was a define that turned it off and on. One of the good things about implementing ptrace-via-utrace is that they co-existed well - you could ptrace a process that you were also using utrace on. If I remember correctly, during one of the utrace reviews, making ptrace-via-utrace optional was requested to be removed. Now with the recent ptrace changes, still implementing ptrace-via-utrace will take some work. OK, so here's my (hacky) idea: (1) Forget ptrace-via-utrace. Have utrace be a separate thing. This way the recent ptrace changes won't matter. (2) But, what about ptrace co-existing well with utrace? Make them mutually exclusive - a ptraced-process can't be utraced and a utraced-process can't be ptraced. Assuming the above is a semi-reasonable idea, it might be a lot less work than updating the ptrace-via-utrace code to handle the new ptrace changes. -- David Smith dsm...@redhat.com Red Hat http://www.redhat.com 256.217.0141 (direct) 256.837.0057 (fax)
Re: resuming after stop at syscall_entry
/. Those can illustrate things with good comments, and also could be built verbatim to load multiple ones/instances in different orders and demonstrate what happens, etc. The wiki would be fine - just somewhere that people could see this stuff. It would be nice to have folks like you and Renzo work up this text and/or examples. What's needed is stuff that makes sense to you guys as users of the API, rather than what makes sense to me who has thought too much already about all this stuff. We should probably just dump your email into the wiki. -- David Smith dsm...@redhat.com Red Hat http://www.redhat.com 256.217.0141 (direct) 256.837.0057 (fax)
Re: syscall tracing overheads: utrace vs. kprobes
Frank Ch. Eigler wrote: Hi - In a few contexts, it comes up as to whether it is faster to probe process syscalls with kprobes or with something higher level such as utrace. (There are other hypothetical options too (per-syscall tracepoints) that could be measured this way in the future.) These scenarios are a bit wrong: Now we compare these scenarios: # stap -e 'probe never {}' -t --vp 1 -c a.out Here, no actual probing occurs so we get a measurement of the plain uninstrumented run time of ten million close(2)s. The above one is fine. # stap -e 'probe process.syscall {}' -t --vp 1 -c a.out Here, we intercept sys_close with a kprobe. If the system is not too busy, we should pick up only the close(2)s coming from a.out, though a few close(2)'s executed by other processes may show up. # stap -e 'probe syscall.close {}' -t --vp 1 -c a.out Here, we intercept all a.out's syscalls with utrace. Other processes are not affected at all, but other syscalls by a.out would be -- though in our test, there are hardly any of those. These 2 are swapped: the 'process.syscall' probe is a utrace-based probe and the 'syscall.close' probe is a kprobe-based probe. Note that in the results, the description and probe types matched correctly. Some typical results on my 2.66GHz 2*Xeon5150 machine runnin Fedora 9 - 2.6.27.12: never: Pass 5: run completed in 740usr/3310sys/4155real ms. kprobe: probe syscall.close (input:1:1), hits: 1028, cycles: 176min/202avg/3632max Pass 5: run completed in 750usr/9320sys/10193real ms. utrace: probe process.syscall (input:1:1), hits: 1025, cycles: 176min/209avg/184392max Pass 5: run completed in 1670usr/6860sys/8645real ms. So utrace added 4.5 seconds, and kprobes added 6.0 seconds to the uninstrumented 4.1 second run time. But wait: we should subtract the time taken by the probe handler itself: 200ish cycles at 2.66 GHz, which is about 0.75 seconds. So the overheads are approximately: never: n/a kprobe: 5.2 seconds = 0.52 us per hit utrace: 3.6 seconds = 0.36 us per hit Note that these are microbenchmarks that represent an ideal case compared to a larger run, since they probably fit comfily inside caches. They probably also undercount the probe handler's run time. -- David Smith dsm...@redhat.com Red Hat http://www.redhat.com 256.217.0141 (direct) 256.837.0057 (fax)
Re: resuming after stop at syscall_entry
Roland McGrath wrote: This processing makes sense I think. It is a bit complicated of course, but not unnecessarily so. I'd like to ask you how this stuff would relate to systemtap (so I've added the systemtap mailing list). I've interspersed a few comments/questions below. ... stuff deleted ... SYSCALL_ENTRY is unlike all other events. Right after this callback loop is when the important user-visible stuff happens (the system call). So we stop immediately there as for the other two. But, if another engine used UTRACE_STOP and maybe did something asynchronously, like modifying the syscall argument registers, you get no opportunity to see what happened. Once all engines lift UTRACE_STOP, the system call runs. ... stuff deleted ... As explained above, the norm of interacting with other engines and their use of UTRACE_STOP is to use the final report. When your callback's action argument includes UTRACE_STOP, you know an earlier engine might be fiddling before the thread resumes. So, your callback can decide to return UTRACE_REPORT. That ensures that some report_quiesce (or report_signal/UTRACE_SIGNAL_REPORT) callback will be made after the other engine lifts its UTRACE_STOP and before user mode. At that point, you can see what user register values it might have installed, etc. In all events but syscall entry, a final report_quiesce(0) serves this need. My proposal is to extend this resume report approach to the syscall entry case. That is, after when some report_syscall_entry returned UTRACE_STOP so we've stopped, allow for a second reporting pass after we've been resumed, before running the system call. You'd get this pass if someone used UTRACE_REPORT. That is, in the first callback loop, one engine used UTRACE_STOP and another used UTRACE_REPORT. Then when the first engine used utrace_control() to resume, there would be a second reporting pass because of the second engine's earlier request. Or, even if there was just one engine, but it used UTRACE_STOP and then used utrace_control(UTRACE_REPORT) to resume, then it would get the second reporting pass. If someone uses UTRACE_STOP+UTRACE_REPORT in that pass, there would be a third pass, etc. What I have in mind is that the second (and however many more) pass would just be another report_syscall_entry callback to everyone with UTRACE_EVENT(SYSCALL_ENTRY) set. A flag bit in the action argument says this is a repeat notification. I think this strikes a decent balance of not adding more callbacks and more arguments to bloat the API in general, while imposing a fairly simple burden on engines to avoid getting confused by multiple calls. A tracing-only engine that just wants to see the syscall that is going to be done can just do: if (utrace_resume_action(action) == UTRACE_STOP) return UTRACE_REPORT; at the top of report_syscall_entry, so it just doesn't think about it until it thinks the call will go now through. Systemtap currently doesn't support changing syscall arguments, if it does, obviously a few things would need to change. But, I think systemtap would probably fall here - only see the syscall that is actually going to be done. So systemtap could possibly get multiple callbacks for the same syscall, but only pay attention to the last one, correct? Say an engine has a different agenda, just to see what syscall argument values came in from user mode before someone else changes them. It does: if (action UTRACE_SYSCALL_RESUMED) return UTRACE_RESUME; to ignore the additional callbacks that might come after somebody decided to stop and report. It just does its work on the first one. Here comes Renzo again! He wants to have two or three or nineteen layers of the first kind of Renzo engine: each one stops at syscall entry, then resumes after changing some registers. He wants these to nest, meaning that after the outermost one stops, fiddles, and resumes, the next one in stops, looks at the register as fiddled by the outermost guy, fiddles in a different way, and resumes, and on and on. Perhaps the first model (if last guy is stopping, punt to look again at resume report) works for that. Or perhaps the engine also needs to keep track with its own state flag it sets whenever it does its work, and then resets in exit tracing to prepare for next time. ... stuff deleted ... So, even I can't write that much text and still think this interface choice is simple to understand. But I kind of think it's around as simple as it can be for its mandates. I'd appreciate any feedback. This is understandable, but does hurt my head a *little* bit. I think if you put the above full text somewhere and provided some examples this would make sense to people. -- David Smith dsm...@redhat.com Red Hat http://www.redhat.com 256.217.0141 (direct) 256.837.0057 (fax)
UTRACE_STOP in exec handler
Roland, I'm seeing a problem on with the new utrace on 2.6.27-0.323.rc6.fc10.x86_64. Basically, if I attach a new engine in an exec handler, and use utrace_control() with UTRACE_STOP, the task doesn't reliably stop. I was seeing this behaviour with systemtap, so I started with crash-suspend.c and mangled it to do something similar, to hopefully simplify the problem down a bit. So, try compiling this module, then insmod it with the pid of a bash process. What I expected to happen is that every new process exec'ed by that bash shell gets stopped once. Instead, ls will run to completion without stopping. If you run cat, then use Ctrl-C to kill it, you'll get a quiesce event. So, is this a problem with the way I'm attempting to stop the thread or a utrace bug? (Note that I've tried returning UTRACE_STOP from the exec handler instead of calling utrace_control(), but it makes no difference.) -- David Smith [EMAIL PROTECTED] Red Hat http://www.redhat.com 256.217.0141 (direct) 256.837.0057 (fax) #include linux/sched.h #include linux/pid.h #include linux/utrace.h #include linux/err.h #include linux/module.h #include linux/errno.h MODULE_DESCRIPTION(automatic suspend on crash); MODULE_LICENSE(GPL); static int target_pid; static int verbose; module_param_named(pid, target_pid, int, 0); module_param(verbose, bool, 0); #define MY_EVENTS (UTRACE_EVENT(CLONE) | UTRACE_EVENT(EXEC)) #define MY_EVENTS2 (UTRACE_EVENT(QUIESCE)) static u32 crash_suspend_quiesce(u32 action, struct utrace_attached_engine *engine, struct task_struct *tsk, unsigned long event) { if (tsk != NULL) { printk(pid %d quiesced\n, tsk-pid); } return UTRACE_DETACH; } static const struct utrace_engine_ops crash_suspend_ops2 = { .report_quiesce = crash_suspend_quiesce, }; /* * On clone, attach to the child. */ static u32 crash_suspend_clone(enum utrace_resume_action action, struct utrace_attached_engine *engine, struct task_struct *parent, unsigned long clone_flags, struct task_struct *child) { struct utrace_attached_engine *child_engine; child_engine = utrace_attach_task(child, UTRACE_ATTACH_CREATE, engine-ops, 0); if (IS_ERR(child_engine)) { printk(attach to clone child %d (%lx) from 0x%p = %ld\n, child-pid, clone_flags, engine, PTR_ERR(child_engine)); } else { int err = utrace_set_events(child, child_engine, MY_EVENTS); WARN_ON(err); utrace_engine_put(child_engine); } return UTRACE_RESUME; } static u32 crash_suspend_exec(enum utrace_resume_action action, struct utrace_attached_engine *engine, struct task_struct *tsk, const struct linux_binfmt *fmt, const struct linux_binprm *bprm, struct pt_regs *regs) { struct utrace_attached_engine *engine2; engine2 = utrace_attach_task(tsk, UTRACE_ATTACH_CREATE, crash_suspend_ops2, 0); if (IS_ERR(engine2)) { printk(attach to exec %d from 0x%p = %ld\n, tsk-pid, engine2, PTR_ERR(engine2)); } else { int err = utrace_set_events(tsk, engine2, MY_EVENTS2); WARN_ON(err); err = utrace_control(tsk, engine2, UTRACE_STOP); printk(utrace_control(%d) returned %d\n, (int)tsk-pid, err); utrace_engine_put(engine2); } return UTRACE_RESUME; } /* * If we are still attached at task death, it didn't die by core dump signal. * Just detach and let it go. */ static u32 crash_suspend_death(struct utrace_attached_engine *engine, struct task_struct *tsk, bool group_dead, int signal) { return UTRACE_DETACH; } static const struct utrace_engine_ops crash_suspend_ops = { .report_clone = crash_suspend_clone, .report_exec = crash_suspend_exec, }; static int __init init_crash_suspend(void) { struct pid *pid; struct utrace_attached_engine *engine; int ret; pid = find_get_pid(target_pid); if (pid == NULL) { printk(cannot find PID %d\n, target_pid); return -ESRCH; } engine = utrace_attach_pid(pid, UTRACE_ATTACH_CREATE, crash_suspend_ops, 0); if (IS_ERR(engine)) printk(utrace_attach: %ld\n, PTR_ERR(engine)); else if (engine == NULL) printk(utrace_attach = null!\n); else printk(attached to %d = 0x%p\n, pid_vnr(pid), engine); ret = utrace_set_events_pid(pid, engine, MY_EVENTS); if (ret == -ESRCH) printk(pid %d died during setup\n, pid_vnr(pid)); else WARN_ON(ret); put_pid(pid); if (engine !IS_ERR(engine)) utrace_engine_put(engine); return 0; } static void __exit exit_crash_suspend(void) { struct task_struct *t; struct utrace_attached_engine *engine; int n = 0; int ret; restart: rcu_read_lock(); for_each_process(t) { engine = utrace_attach_task(t, UTRACE_ATTACH_MATCH_OPS, crash_suspend_ops, 0); if (IS_ERR(engine)) { int error = -PTR_ERR(engine); if (error != ENOENT) printk(!!! utrace_attach returned %d on %d\n, error, t-pid); continue; } ret = utrace_control(t, engine, UTRACE_DETACH); if (ret == -EINPROGRESS) { /* * It's running our callback, so we have
utrace_set_events in quiesce handler?
In systemtap, we've changed to stopping a thread before setting up the events we're interested in (besides quiesce/death). So, basically, it looks like this (without much error handling): // initial attach logic // ... find an interesting thread ... ops.report_quiesce = quiesce_handler; ops.report_syscall_entry = syscall_entry_handler; engine = utrace_attach_task(tsk, UTRACE_ATTACH_CREATE, ops, data); rc = utrace_set_events(tsk, engine, (UTRACE_EVENT(DEATH) | UTRACE_STOP | UTRACE_EVENT(QUIESCE))); // ... do other stuff ... // quiesce handler u32 quiesce_handler(enum utrace_resume_action action, struct utrace_attached_engine *engine, struct task_struct *tsk, unsigned long event) { int rc; // Turn off quiesce handling and turn on syscall handling rc = utrace_set_events(tsk, engine, UTRACE_EVENT(DEATH) | UTRACE_EVENT(SYSCALL_ENTRY)); if (rc == -EINPROGRESS) { rc = utrace_barrier(tsk, engine); if (rc != 0) printk(KERN_ERR utrace_barrier returned error %d on pid %d, rc, (int)tsk-pid); rc = utrace_set_events(tsk, engine, UTRACE_EVENT(DEATH) | UTRACE_EVENT(SYSCALL_ENTRY)); } if (rc != 0) printk(KERN_ERR utrace_set_events returned error %d on pid %d, rc, (int)tsk-pid); // ... do other stuff ... return UTRACE_RESUME; } The utrace_barrier() call always returns 0, but the utrace_set_events() calls always return -EINPRPOGRESS. I've put the -EINPROGRESS logic in a loop, but even after 10 iterations utrace_set_events() never succeeds. This is on 2.6.27-0.287.rc4.git7.fc10.x86_64. So, am I doing this incorrectly? Or is there a bug here in utrace (where it doesn't expect to see a utrace_set_events() from within a handler on the same thread)? If I'm doing this incorrectly, I'd like help in figuring out what I should be doing. (In the original utrace there was UTRACE_ACTION_NEWSTATE which allowed you to change the flags from the handler, but I haven't seen anything similar in the current utrace.) Thanks for the help. -- David Smith [EMAIL PROTECTED] Red Hat http://www.redhat.com 256.217.0141 (direct) 256.837.0057 (fax)
Re: global tracing
Roland McGrath wrote: We've mentioned global tracing. I think it's time now to discuss it thoroughly and decide what we do or don't want to do. ... 2. Why do we want utrace global tracing? From a systemtap point of view, we'd certainly use global tracing. ... 3. What would it look like? Global engines' callbacks all run after all per-task engine callbacks. (This could change in future.) I guess in a perfect world callbacks would still be called in the order they were attached. But, if calling the global callbacks last makes things easier, I think systemtap could handle it. I had originally planned to rule out SYSCALL events for global tracing. The reason is that this is not like other event checks where a simple flag gets checked cheaply. Instead, it requires setting the low-level TIF_SYSCALL_TRACE on a thread, which makes it take a far slower path on system call entry and exit, and has a big impact on performance just from that alone. Global tracing has to set this individually on every thread, and then pay that big overhead across the board. If we had utrace memory map tracing (I believe it is on your TODO list), systemtap wouldn't use global (or even per-thread) SYSCALL events as much. ... I'd kind of prefer to exclude REAP events for global tracing. Currently systemtap only uses DEATH events, so I don't have much of an opinion there. ... 4. So, what's the plan? I need folks who might use global tracing to answer these questions: a. Do we want it? Yes. Systemtap currently does global tracing now, in a manner similar to crash-suspend.c. The code looks for global CLONE, EXEC, and DEATH events, so systemtap knows when threads come and go. Once systemtap finds a process the user has told us he's interested in, it attaches some additional per-thread engine(s). In the future, Frank has mentioned trying to do global memory map tracing, which would require global syscall tracing (or future global memory map tracing). b. Do we want it right now? Yes. If you need beta testers, let me know. c. What justifies doing it in utrace (vs leaving it purely to tracepoints et al), to placate upstream critics? Please don't say, That would be nice; your reasons sound good. That just does not help at all. The reasons in #2 above are ones I can think of, but I'm not arguing for them or for the feature. If you want the feature, *you* will be justifying it to the upstream critics. Let's here be as skeptical about adding the new complexity, before we decide on doing it, as our unsympathetic reviewers will be. Global tracing would be *really* nice; your reasons sound *great*. How's that? :-) Seriously, your reasons a. (Event vocabulary clearly aligned with utrace events), b. (Coordinated with per-task utrace callbacks), and d. (Kernel already has checks here, so almost free) apply most clearly to systemtap. Systemtap doesn't currently change outcomes in a callback, so reason c. doesn't apply much. Systemtap is interested in performance impacts and the a./b. advantages seem quite obvious to me. Avoiding the complexities of manually attaching/detaching to every thread in the system seems important also. -- David Smith [EMAIL PROTECTED] Red Hat http://www.redhat.com 256.217.0141 (direct) 256.837.0057 (fax)
Re: crash-suspend teardown races
Roland McGrath wrote: Thanks, David. That is exactly the right example of using kernel synchronization primitives with callbacks to implement blocking behaviors you want. The wrinkle there is that you use UTRACE_INTERRUPT, which (potentially) perturbs the behavior of every traced thread. Doing this gives you a simple a way to do synchronous detach and avoid those races. It's a prime example of why asynchronous detach is harder and we need to hash it out. What you've done is the only thing that's straightforward to do now, but it has one of the bad old side effects of ptrace (interrupting detach) that we need to eliminate to make the facility acceptable as the basis for pervasive tracing of many processes on the system. Is there a way to avoid using UTRACE_INTERRUPT? Certainly I'd like to avoid disturbing the processes we're tracing. -- David Smith [EMAIL PROTECTED] Red Hat http://www.redhat.com 256.217.0141 (direct) 256.837.0057 (fax)
crash-suspend teardown races
For background to this email, read the teardown races section in utrace.txt and Roland's asynchronous detach email. The crash-suspend.c example suffers from the teardown races problem. (Since systemtap's utrace code is a more elaborate version of crash-suspend.c, systemtap has the same problem.) So, I've attempted to come up with a solution that doesn't pervert the example too much. I've attached a patch with the details. Basically, at module unload time, instead of detaching directly, it tries to asynchronously stop all the threads we're attached to and then let the quiesce handler detach from the thread. The semi-tricky part was letting the module unload function, exit_crash_suspend(), know when all threads that we had attached to were detached. To do this, the code keeps up with an attach count. When the attach count reaches 0, the module unload function gets woken up to go ahead and exit. I'd appreciate any thoughts, criticisms, etc. on this patch, which I've tested under kernel 2.6.27-0.186.rc0.git15.fc10. -- David Smith [EMAIL PROTECTED] Red Hat http://www.redhat.com 256.217.0141 (direct) 256.837.0057 (fax) --- /home/dsmith/crash-suspend.c2008-07-30 15:36:46.0 -0500 +++ crash-suspend2.c2008-07-30 16:01:58.0 -0500 @@ -18,6 +18,14 @@ module_param(verbose, bool, 0); UTRACE_EVENT(SIGNAL_CORE) | UTRACE_EVENT(JCTL) | \ UTRACE_EVENT(QUIESCE)) +#define SHUTDOWN_EVENTS (UTRACE_EVENT(QUIESCE) | UTRACE_EVENT(DEATH)) +#define CS_STARTING0 +#define CS_STOPPING1 +atomic_t state = ATOMIC_INIT(CS_STARTING); +atomic_t attach_count = ATOMIC_INIT (0); + +static DECLARE_WAIT_QUEUE_HEAD(crash_suspend_wq); + /* * This is the interesting hook. */ @@ -61,6 +69,13 @@ crash_suspend_quiesce(u32 action, struct */ if (!event) engine-data = NULL; + + if (atomic_read(state) == CS_STOPPING) { + if (atomic_dec_return(attach_count) = 0) { + wake_up(crash_suspend_wq); + } + return UTRACE_DETACH; + } return UTRACE_RESUME; } @@ -85,7 +100,7 @@ crash_suspend_jctl(enum utrace_resume_ac * proper weirdo status, stop the rest of the * process group too in normal job control fashion. */ - (void) kill_pgrp(find_pid(-pgid), SIGTTOU, 1); + (void) kill_pgrp(find_vpid(-pgid), SIGTTOU, 1); } else if (engine-data) { /* * We've been resumed after a crash. @@ -117,6 +132,7 @@ crash_suspend_clone(enum utrace_resume_a } else { int err = utrace_set_events(child, child_engine, MY_EVENTS); WARN_ON(err); + atomic_inc(attach_count); } return UTRACE_RESUME; @@ -131,13 +147,21 @@ crash_suspend_death(struct utrace_attach struct task_struct *tsk, bool group_dead, int signal) { + if (atomic_read(state) == CS_STOPPING) { + if (atomic_dec_return(attach_count) = 0) { + wake_up(crash_suspend_wq); + } + } + else { + atomic_dec(attach_count); + } return UTRACE_DETACH; } - static const struct utrace_engine_ops crash_suspend_ops = { .report_clone = crash_suspend_clone, + .report_quiesce = crash_suspend_quiesce, .report_death = crash_suspend_death, .report_signal = crash_suspend_signal, .report_jctl = crash_suspend_jctl, @@ -151,7 +175,7 @@ static int __init init_crash_suspend(voi int ret; rcu_read_lock(); - target = find_task_by_pid(target_pid); + target = find_task_by_vpid(target_pid); if (target) get_task_struct(target); rcu_read_unlock(); @@ -173,8 +197,10 @@ static int __init init_crash_suspend(voi ret = utrace_set_events(target, engine, MY_EVENTS); if (ret == -ESRCH) printk(pid %d died during setup\n, target-pid); - else + else { + atomic_inc(attach_count); WARN_ON(ret); + } WARN_ON(atomic_dec_and_test(target-usage)); return 0; @@ -186,6 +212,7 @@ static void __exit exit_crash_suspend(vo struct utrace_attached_engine *engine; int n = 0; + atomic_set(state, CS_STOPPING); rcu_read_lock(); for_each_process(t) { engine = utrace_attach(t, UTRACE_ATTACH_MATCH_OPS, @@ -197,14 +224,33 @@ static void __exit exit_crash_suspend(vo error, t-pid); } else { - int ret = utrace_control(t, engine, UTRACE_DETACH); + int ret = utrace_set_events(t, engine, + SHUTDOWN_EVENTS
Re: utrace doc update, nearing submission(?)
Roland McGrath wrote: I've read through most of the docs (although my eyes certainly glazed over when I got to the tracehook stuff). That stuff is for kernel maintainers, as it says. You don't really need to know. For that part, just checking for typos is all the help it needs. What's there seems to be pretty good. I think it is missing (or at least I missed it) a good description of how to asynchronously stop a thread. What are the ins-and-outs here, knowing how/when to use UTRACE_STOP vs. UTRACE_INTERRUPT, etc. Yeah, the kerneldoc-driven format doesn't lend itself to a lot of exposition. The details are in there in the utrace_control description. But I guess it's not real obvious how to put it all together. How is the Stopping Safely section? That seems like the place to add something more direct about this. Was there something other than the UTRACE_INTERRUPT issue that didn't seem clear? Is that not clear to you, or is it just not clear in the documentation? Probably both - see below. You use UTRACE_INTERRUPT when you want to interfere with system calls in progress (or blocking page faults or whatever). To interrupt and stop (like PTRACE_ATTACH does) and handle it already being stopped, you really want to do UTRACE_STOP first and see 0 if it's already stopped. If it's not already stopped, you get -EINPROGRESS and then can do UTRACE_INTERRUPT to be sure that you interrupt it and that the race complexity of it being stopped before or waking up is minimized. (If you just do UTRACE_INTERRUPT first, then you'll get a callback soon--unless it's stopped. Then you won't, but if you do UTRACE_STOP second to see if it's stopped, then you could get the callback in between and your life gets more hairy.) Ah. That paragraph makes lots of sense - I wish it could be included somewhere. So, basically, you do something like this: rc = utrace_control(t, engine, UTRACE_STOP); if (rc == -EINPROGRESS) { rc = utrace_control(t, engine, UTRACE_INTERRUPT); } -- David Smith [EMAIL PROTECTED] Red Hat http://www.redhat.com 256.217.0141 (direct) 256.837.0057 (fax)
crash-suspend.c using new interface
To test out the new utrace interface (I was using kernel 2.6.26-138.fc10.x86_64), I ported the crash-suspend.c example from the old interface to the new interface. It is attached to this email. Feel free to ignore/delete the extra debug printk's. When run, it does what is is supposed to do, but triggers a BUG_ON message when the crashed/suspended process is put back in the foreground. Note that I don't have any real feel for whether the bug lies in my crash-suspend.c translation or the new utrace itself. The BUG details are in msg.txt. -- David Smith [EMAIL PROTECTED] Red Hat http://www.redhat.com 256.217.0141 (direct) 256.837.0057 (fax) #include linux/sched.h #include linux/pid.h #include linux/utrace.h #include linux/err.h #include linux/module.h #include linux/errno.h MODULE_DESCRIPTION(automatic suspend on crash); MODULE_LICENSE(GPL); static int target_pid; static int verbose; module_param_named(pid, target_pid, int, 0); module_param(verbose, bool, 0); #define MY_EVENTS (UTRACE_EVENT(CLONE) | UTRACE_EVENT(DEATH) \ | UTRACE_EVENT(SIGNAL_CORE) | UTRACE_EVENT(JCTL)) /* * This is the interesting hook. */ static u32 crash_suspend_signal(u32 action, struct utrace_attached_engine *engine, struct task_struct *tsk, struct pt_regs *regs, siginfo_t *info, const struct k_sigaction *orig_ka, struct k_sigaction *return_ka) { printk(%s:%d action = 0x%x\n, __FUNCTION__, __LINE__, action); if (info-si_errno 0x8000) { info-si_errno = ~0x8000; printk(%s:%d\n, __FUNCTION__, __LINE__); return UTRACE_RESUME; } /* * If another engine is doing something, just get out of the way. */ if ((action UTRACE_SIGNAL_MASK) != UTRACE_SIGNAL_CORE) return UTRACE_RESUME; printk(%s:%d\n, __FUNCTION__, __LINE__); info-si_errno |= 0x8000; return UTRACE_SIGNAL_TSTP | UTRACE_SIGNAL_HOLD; } static u32 crash_suspend_jctl(enum utrace_resume_action action, struct utrace_attached_engine *engine, struct task_struct *tsk, bool notify, int type) { printk(%s:%d type = 0x%x\n, __FUNCTION__, __LINE__, type); if (type == CLD_STOPPED) { int signr = tsk-exit_code; pid_t pgid = task_pgrp_nr(tsk); if (verbose) printk(crash-suspend stopped pgrp %d for pid %d signal %d\n, pgid, tsk-pid, signr); if (signr != SIGSTOP signr != SIGTSTP signr != SIGTTOU signr != SIGTTIN) /* * This is an unnatural stop induced by us, above. * Now that we have ourselves stopped with the * proper weirdo status, stop the rest of the * process group too in normal job control fashion. */ (void) kill_pgrp(find_pid(-pgid), SIGTTOU, 1); } return UTRACE_RESUME; } /* * On clone, attach to the child. */ static u32 crash_suspend_clone(enum utrace_resume_action action, struct utrace_attached_engine *engine, struct task_struct *parent, unsigned long clone_flags, struct task_struct *child) { struct utrace_attached_engine *child_engine; printk(%s:%d\n, __FUNCTION__, __LINE__); child_engine = utrace_attach(child, UTRACE_ATTACH_CREATE, engine-ops, 0); if (IS_ERR(child_engine)) { printk(attach to clone child %d (%lx) from 0x%p = %ld\n, child-pid, clone_flags, engine, PTR_ERR(child_engine)); } else utrace_set_events(child, child_engine, MY_EVENTS); return UTRACE_RESUME; } /* * If we are still attached at task death, it didn't die by core dump signal. * Just detach and let it go. */ static u32 crash_suspend_death(struct utrace_attached_engine *engine, struct task_struct *tsk, bool group_dead, int signal) { printk(%s:%d\n, __FUNCTION__, __LINE__); return UTRACE_DETACH; } static const struct utrace_engine_ops crash_suspend_ops = { .report_clone = crash_suspend_clone, .report_death = crash_suspend_death, .report_signal = crash_suspend_signal, .report_jctl = crash_suspend_jctl, }; static int __init init_crash_suspend(void) { struct task_struct *target; struct utrace_attached_engine *engine; rcu_read_lock(); target = find_task_by_pid(target_pid); if (target) get_task_struct(target); rcu_read_unlock(); if (target == NULL) { printk(cannot find PID %d\n, target_pid); return -ESRCH; } engine = utrace_attach(target, UTRACE_ATTACH_CREATE, crash_suspend_ops, 0); if (IS_ERR(engine)) printk(utrace_attach: %ld\n, PTR_ERR(engine)); else if (engine == NULL) printk(utrace_attach = null!\n); else printk(attached to %d = 0x%p\n, target-pid, engine); utrace_set_events(target, engine, MY_EVENTS); WARN_ON(atomic_dec_and_test(target-usage)); return 0; } static void __exit exit_crash_suspend(void) { struct task_struct *t; struct utrace_attached_engine *engine; int n = 0; rcu_read_lock(); for_each_process(t) { engine = utrace_attach(t, UTRACE_ATTACH_MATCH_OPS, crash_suspend_ops, 0); if (IS_ERR(engine)) { int error = -PTR_ERR(engine); if (error != ENOENT) printk(!!! utrace_attach returned %d on %d\n
Re: Tracing Syscalls under Fedora 9
Martin Süßkraut wrote: Hi, has the tracing of system calls changed in utrace between Fedora 8 and 9? My module works fine under Fedora 8, but under Fedora 9 the callbacks report_syscall_entry and report_syscall_exit seam not to be invoked any more. If it helps at all, systemtap is seeing the same thing - system call callbacks that got called under F8 aren't getting called under F9. -- David Smith [EMAIL PROTECTED] Red Hat http://www.redhat.com 256.217.0141 (direct) 256.837.0057 (fax)
need quiesce help
In systemtap (http://sourceware.org/systemtap/), we're using utrace to be able to put systemtap probes in arbitrary threads. Internally, there are two fairly separate and distinct layers, each using utrace on the same thread. This has worked well. However, I'm having trouble when each layer requests UTRACE_ACTION_QUIESCE. The upper layer does: utrace_set_flags(tsk, engine, UTRACE_ACTION_QUIESCE|UTRACE_EVENT(QUIESCE)); Then, in the .report_quiesce handler, it does its processing and returns UTRACE_ACTION_DETACH. The upper layer is working correctly. The lower layer does: utrace_set_flags(tsk, engine, UTRACE_ACTION_QUIESCE|UTRACE_EVENT(QUIESCE)|UTRACE_EVENT(DEATH)); After the lower layer's .report_quiesce handler is called, I'd like to turn off quiesce handling for this engine and leave on UTRACE_EVENT(DEATH) handling. I've tried various combinations of return values and calling utrace_set_flags() in the .report_quiesce handler, but this handler always seems to get called twice (when the upper layer also does a UTRACE_ACTION_QUIESCE). So, what does the lower layer need to do to correctly turn off quiesce handling for the engine in this case? Note that both layer's engines get installed on the thread in the same callback. Thanks for the help. -- David Smith [EMAIL PROTECTED] Red Hat http://www.redhat.com 256.217.0141 (direct) 256.837.0057 (fax)