Re: resuming after stop at syscall_entry
Roland McGrath wrote: This processing makes sense I think. It is a bit complicated of course, but not unnecessarily so. Glad to hear it! A tracing-only engine that just wants to see the syscall that is going to be done can just do: if (utrace_resume_action(action) == UTRACE_STOP) return UTRACE_REPORT; at the top of report_syscall_entry, so it just doesn't think about it until it thinks the call will go now through. Systemtap currently doesn't support changing syscall arguments, if it does, obviously a few things would need to change. But, I think systemtap would probably fall here - only see the syscall that is actually going to be done. So systemtap could possibly get multiple callbacks for the same syscall, but only pay attention to the last one, correct? Correct. The advice quoted above is what its callbacks would do to ignore the callbacks before the last one. Note that you'll only be sure you're seeing actually going to be done state if yours is the first engine attached. (Thus, by the new special case calling order, its will be the last report_syscall_entry callback to run.) This is just the general engine priority thing, not anything new. In cases like ptrace and kmview (Renzo's thing), even if these engines are first (i.e. called after yours), you will still be seeing the final state because they did their changes asynchronously before resuming. But some other engine might do its changes directly in its own callback instead (whether it used UTRACE_STOP and got a repeat callback, or just on the first time through without stopping), so those changes would happen only after your last callback. In the same vein, earlier engines (i.e. here called after yours) might use UTRACE_STOP after your first callback had every reason to believe it was the last one (i.e. that if did not hit). In that case, you will get a repeat call (with UTRACE_SYSCALL_RESUMED flag). On that call, you need to cope with the fact that you already did your entry tracing work before (but now things may have changed). If the theory is that you want to respect your place in the engine order, whatever that is (i.e., if your tracing just reported a lie, it was the lie you were supposed to believe), then coping just means ignoring the repeat. (This is no different in kind from an earlier engine/later callback changing the registers after your callback and never stopping.) For that you need to keep track of whether you already handled it or not. (Depending on your relative order and the actions of the other engines, you might get either UTRACE_STOP or UTRACE_SYSCALL_RESUMED either before or after you handled it. So you can't use those alone.) You can do this in two ways. One is to use your own per-thread state (engine-data, etc.). The other is to disable the SYSCALL_ENTRY event when you've handled it, so you won't get more callbacks. Then you can re-enable the event in your report_syscall_exit callback (or report_quiesce/report_signal, or whatever is most convenient to be sure you'll run before it goes back to user mode). i.e., use utrace_set_events() from the callbacks. It sounds like disabling SYSCALL_ENTRY then re-enabling it in the report_syscall_exit() callback is a reasonable way to go. This is understandable, but does hurt my head a *little* bit. I think if you put the above full text somewhere and provided some examples this would make sense to people. The utrace-syscall-resumed branch puts this in the kerneldoc text for struct utrace_engine_ops (where callback return values and common arguments are described): * When %UTRACE_STOP is used in @report_syscall_entry, then @task + * stops before attempting the system call. In this case, another + * @report_syscall_entry callback follows after @task resumes; in a + * second or later callback, %UTRACE_SYSCALL_RESUMED is set in the + * @action argument to indicate a repeat callback still waiting to + * attempt the same system call invocation. This repeat callback + * gives each engine an opportunity to reexamine registers another + * engine might have changed while @task was held in %UTRACE_STOP. + * + * In other cases, the resume action does not take effect until @task + * is ready to check for signals and return to user mode. If there + * are more callbacks to be made, the last round of calls determines + * the final action. A @report_quiesce callback with @event zero, or + * a @report_signal callback, will always be the last one made before + * @task resumes. Only %UTRACE_STOP is sticky--if @engine returned + * %UTRACE_STOP then @task stays stopped unless @engine returns + * different from a following callback. I don't know where the longer explanation and/or examples belong. Perhaps in a new section in utrace.tmpl? We could start with putting together some text on the wiki. Another idea is to add a few example modules in samples/utrace/.
Re: resuming after stop at syscall_entry
Roland McGrath wrote: This processing makes sense I think. It is a bit complicated of course, but not unnecessarily so. I'd like to ask you how this stuff would relate to systemtap (so I've added the systemtap mailing list). I've interspersed a few comments/questions below. ... stuff deleted ... SYSCALL_ENTRY is unlike all other events. Right after this callback loop is when the important user-visible stuff happens (the system call). So we stop immediately there as for the other two. But, if another engine used UTRACE_STOP and maybe did something asynchronously, like modifying the syscall argument registers, you get no opportunity to see what happened. Once all engines lift UTRACE_STOP, the system call runs. ... stuff deleted ... As explained above, the norm of interacting with other engines and their use of UTRACE_STOP is to use the final report. When your callback's action argument includes UTRACE_STOP, you know an earlier engine might be fiddling before the thread resumes. So, your callback can decide to return UTRACE_REPORT. That ensures that some report_quiesce (or report_signal/UTRACE_SIGNAL_REPORT) callback will be made after the other engine lifts its UTRACE_STOP and before user mode. At that point, you can see what user register values it might have installed, etc. In all events but syscall entry, a final report_quiesce(0) serves this need. My proposal is to extend this resume report approach to the syscall entry case. That is, after when some report_syscall_entry returned UTRACE_STOP so we've stopped, allow for a second reporting pass after we've been resumed, before running the system call. You'd get this pass if someone used UTRACE_REPORT. That is, in the first callback loop, one engine used UTRACE_STOP and another used UTRACE_REPORT. Then when the first engine used utrace_control() to resume, there would be a second reporting pass because of the second engine's earlier request. Or, even if there was just one engine, but it used UTRACE_STOP and then used utrace_control(UTRACE_REPORT) to resume, then it would get the second reporting pass. If someone uses UTRACE_STOP+UTRACE_REPORT in that pass, there would be a third pass, etc. What I have in mind is that the second (and however many more) pass would just be another report_syscall_entry callback to everyone with UTRACE_EVENT(SYSCALL_ENTRY) set. A flag bit in the action argument says this is a repeat notification. I think this strikes a decent balance of not adding more callbacks and more arguments to bloat the API in general, while imposing a fairly simple burden on engines to avoid getting confused by multiple calls. A tracing-only engine that just wants to see the syscall that is going to be done can just do: if (utrace_resume_action(action) == UTRACE_STOP) return UTRACE_REPORT; at the top of report_syscall_entry, so it just doesn't think about it until it thinks the call will go now through. Systemtap currently doesn't support changing syscall arguments, if it does, obviously a few things would need to change. But, I think systemtap would probably fall here - only see the syscall that is actually going to be done. So systemtap could possibly get multiple callbacks for the same syscall, but only pay attention to the last one, correct? Say an engine has a different agenda, just to see what syscall argument values came in from user mode before someone else changes them. It does: if (action UTRACE_SYSCALL_RESUMED) return UTRACE_RESUME; to ignore the additional callbacks that might come after somebody decided to stop and report. It just does its work on the first one. Here comes Renzo again! He wants to have two or three or nineteen layers of the first kind of Renzo engine: each one stops at syscall entry, then resumes after changing some registers. He wants these to nest, meaning that after the outermost one stops, fiddles, and resumes, the next one in stops, looks at the register as fiddled by the outermost guy, fiddles in a different way, and resumes, and on and on. Perhaps the first model (if last guy is stopping, punt to look again at resume report) works for that. Or perhaps the engine also needs to keep track with its own state flag it sets whenever it does its work, and then resets in exit tracing to prepare for next time. ... stuff deleted ... So, even I can't write that much text and still think this interface choice is simple to understand. But I kind of think it's around as simple as it can be for its mandates. I'd appreciate any feedback. This is understandable, but does hurt my head a *little* bit. I think if you put the above full text somewhere and provided some examples this would make sense to people. -- David Smith dsm...@redhat.com Red Hat http://www.redhat.com 256.217.0141 (direct) 256.837.0057 (fax)