Re: resuming after stop at syscall_entry

2009-04-28 Thread David Smith
Roland McGrath wrote:
 This processing makes sense I think.  It is a bit complicated of course,
 but not unnecessarily so.
 
 Glad to hear it!
 
 A tracing-only engine that just wants to see the syscall that is going
 to be done can just do:

 if (utrace_resume_action(action) == UTRACE_STOP)
 return UTRACE_REPORT;

 at the top of report_syscall_entry, so it just doesn't think about it
 until it thinks the call will go now through.  
 Systemtap currently doesn't support changing syscall arguments, if it
 does, obviously a few things would need to change.

 But, I think systemtap would probably fall here - only see the syscall
 that is actually going to be done.  So systemtap could possibly get
 multiple callbacks for the same syscall, but only pay attention to the
 last one, correct?
 
 Correct.  The advice quoted above is what its callbacks would do to ignore
 the callbacks before the last one.
 
 Note that you'll only be sure you're seeing actually going to be done
 state if yours is the first engine attached.  (Thus, by the new special
 case calling order, its will be the last report_syscall_entry callback to
 run.)  This is just the general engine priority thing, not anything new.
 
 In cases like ptrace and kmview (Renzo's thing), even if these engines are
 first (i.e. called after yours), you will still be seeing the final state
 because they did their changes asynchronously before resuming.  But some
 other engine might do its changes directly in its own callback instead
 (whether it used UTRACE_STOP and got a repeat callback, or just on the
 first time through without stopping), so those changes would happen only
 after your last callback.
 
 In the same vein, earlier engines (i.e. here called after yours) might
 use UTRACE_STOP after your first callback had every reason to believe it
 was the last one (i.e. that if did not hit).  In that case, you will get
 a repeat call (with UTRACE_SYSCALL_RESUMED flag).  On that call, you need
 to cope with the fact that you already did your entry tracing work before
 (but now things may have changed).  
 
 If the theory is that you want to respect your place in the engine order,
 whatever that is (i.e., if your tracing just reported a lie, it was the lie
 you were supposed to believe), then coping just means ignoring the
 repeat.  (This is no different in kind from an earlier engine/later
 callback changing the registers after your callback and never stopping.)
 
 For that you need to keep track of whether you already handled it or not.
 (Depending on your relative order and the actions of the other engines, you
 might get either UTRACE_STOP or UTRACE_SYSCALL_RESUMED either before or
 after you handled it.  So you can't use those alone.)  You can do this in
 two ways.  One is to use your own per-thread state (engine-data, etc.).
 The other is to disable the SYSCALL_ENTRY event when you've handled it, so
 you won't get more callbacks.  Then you can re-enable the event in your
 report_syscall_exit callback (or report_quiesce/report_signal, or whatever
 is most convenient to be sure you'll run before it goes back to user mode).
 i.e., use utrace_set_events() from the callbacks.

It sounds like disabling SYSCALL_ENTRY then re-enabling it in the
report_syscall_exit() callback is a reasonable way to go.

 This is understandable, but does hurt my head a *little* bit.  I think
 if you put the above full text somewhere and provided some examples this
 would make sense to people.
 
 The utrace-syscall-resumed branch puts this in the kerneldoc text for
 struct utrace_engine_ops (where callback return values and common arguments
 are described):
 
   * When %UTRACE_STOP is used in @report_syscall_entry, then @task
 + * stops before attempting the system call.  In this case, another
 + * @report_syscall_entry callback follows after @task resumes; in a
 + * second or later callback, %UTRACE_SYSCALL_RESUMED is set in the
 + * @action argument to indicate a repeat callback still waiting to
 + * attempt the same system call invocation.  This repeat callback
 + * gives each engine an opportunity to reexamine registers another
 + * engine might have changed while @task was held in %UTRACE_STOP.
 + *
 + * In other cases, the resume action does not take effect until @task
 + * is ready to check for signals and return to user mode.  If there
 + * are more callbacks to be made, the last round of calls determines
 + * the final action.  A @report_quiesce callback with @event zero, or
 + * a @report_signal callback, will always be the last one made before
 + * @task resumes.  Only %UTRACE_STOP is sticky--if @engine returned
 + * %UTRACE_STOP then @task stays stopped unless @engine returns
 + * different from a following callback.
 
 I don't know where the longer explanation and/or examples belong.
 Perhaps in a new section in utrace.tmpl?  We could start with putting
 together some text on the wiki.  Another idea is to add a few example
 modules in samples/utrace/.  

Re: resuming after stop at syscall_entry

2009-04-22 Thread David Smith
Roland McGrath wrote:

This processing makes sense I think.  It is a bit complicated of course,
but not unnecessarily so.

I'd like to ask you how this stuff would relate to systemtap (so I've
added the systemtap mailing list).  I've interspersed a few
comments/questions below.

... stuff deleted ...
 SYSCALL_ENTRY is unlike all other events.  Right after this callback
 loop is when the important user-visible stuff happens (the system call).
 So we stop immediately there as for the other two.  But, if another
 engine used UTRACE_STOP and maybe did something asynchronously, like
 modifying the syscall argument registers, you get no opportunity to see
 what happened.  Once all engines lift UTRACE_STOP, the system call runs.

... stuff deleted ...

 As explained above, the norm of interacting with other engines and their
 use of UTRACE_STOP is to use the final report.  When your callback's
 action argument includes UTRACE_STOP, you know an earlier engine might
 be fiddling before the thread resumes.  So, your callback can decide to
 return UTRACE_REPORT.  That ensures that some report_quiesce (or
 report_signal/UTRACE_SIGNAL_REPORT) callback will be made after the
 other engine lifts its UTRACE_STOP and before user mode.  At that point,
 you can see what user register values it might have installed, etc.  In
 all events but syscall entry, a final report_quiesce(0) serves this need.
 
 My proposal is to extend this resume report approach to the syscall
 entry case.  That is, after when some report_syscall_entry returned
 UTRACE_STOP so we've stopped, allow for a second reporting pass after
 we've been resumed, before running the system call.  You'd get this pass
 if someone used UTRACE_REPORT.  That is, in the first callback loop, one
 engine used UTRACE_STOP and another used UTRACE_REPORT.  Then when the
 first engine used utrace_control() to resume, there would be a second
 reporting pass because of the second engine's earlier request.  Or, even
 if there was just one engine, but it used UTRACE_STOP and then used
 utrace_control(UTRACE_REPORT) to resume, then it would get the second
 reporting pass.  If someone uses UTRACE_STOP+UTRACE_REPORT in that pass,
 there would be a third pass, etc.
 
 What I have in mind is that the second (and however many more) pass
 would just be another report_syscall_entry callback to everyone with
 UTRACE_EVENT(SYSCALL_ENTRY) set.  A flag bit in the action argument says
 this is a repeat notification.
 
 I think this strikes a decent balance of not adding more callbacks and
 more arguments to bloat the API in general, while imposing a fairly
 simple burden on engines to avoid getting confused by multiple calls.
 
 A tracing-only engine that just wants to see the syscall that is going
 to be done can just do:
 
   if (utrace_resume_action(action) == UTRACE_STOP)
   return UTRACE_REPORT;
 
 at the top of report_syscall_entry, so it just doesn't think about it
 until it thinks the call will go now through.  

Systemtap currently doesn't support changing syscall arguments, if it
does, obviously a few things would need to change.

But, I think systemtap would probably fall here - only see the syscall
that is actually going to be done.  So systemtap could possibly get
multiple callbacks for the same syscall, but only pay attention to the
last one, correct?

 Say an engine has a different agenda, just to see what syscall argument
 values came in from user mode before someone else changes them.  It does:
 
   if (action  UTRACE_SYSCALL_RESUMED)
   return UTRACE_RESUME;
 
 to ignore the additional callbacks that might come after somebody
 decided to stop and report.  It just does its work on the first one.
 
 Here comes Renzo again!  He wants to have two or three or nineteen
 layers of the first kind of Renzo engine: each one stops at syscall
 entry, then resumes after changing some registers.  He wants these to
 nest, meaning that after the outermost one stops, fiddles, and
 resumes, the next one in stops, looks at the register as fiddled by
 the outermost guy, fiddles in a different way, and resumes, and on and
 on.  Perhaps the first model (if last guy is stopping, punt to look
 again at resume report) works for that.  Or perhaps the engine also
 needs to keep track with its own state flag it sets whenever it does its
 work, and then resets in exit tracing to prepare for next time.

... stuff deleted ...

 So, even I can't write that much text and still think this interface
 choice is simple to understand.  But I kind of think it's around as
 simple as it can be for its mandates.  I'd appreciate any feedback.

This is understandable, but does hurt my head a *little* bit.  I think
if you put the above full text somewhere and provided some examples this
would make sense to people.

-- 
David Smith
dsm...@redhat.com
Red Hat
http://www.redhat.com
256.217.0141 (direct)
256.837.0057 (fax)