Re: resuming after stop at syscall_entry

2009-04-28 Thread David Smith
Roland McGrath wrote:
 This processing makes sense I think.  It is a bit complicated of course,
 but not unnecessarily so.
 
 Glad to hear it!
 
 A tracing-only engine that just wants to see the syscall that is going
 to be done can just do:

 if (utrace_resume_action(action) == UTRACE_STOP)
 return UTRACE_REPORT;

 at the top of report_syscall_entry, so it just doesn't think about it
 until it thinks the call will go now through.  
 Systemtap currently doesn't support changing syscall arguments, if it
 does, obviously a few things would need to change.

 But, I think systemtap would probably fall here - only see the syscall
 that is actually going to be done.  So systemtap could possibly get
 multiple callbacks for the same syscall, but only pay attention to the
 last one, correct?
 
 Correct.  The advice quoted above is what its callbacks would do to ignore
 the callbacks before the last one.
 
 Note that you'll only be sure you're seeing actually going to be done
 state if yours is the first engine attached.  (Thus, by the new special
 case calling order, its will be the last report_syscall_entry callback to
 run.)  This is just the general engine priority thing, not anything new.
 
 In cases like ptrace and kmview (Renzo's thing), even if these engines are
 first (i.e. called after yours), you will still be seeing the final state
 because they did their changes asynchronously before resuming.  But some
 other engine might do its changes directly in its own callback instead
 (whether it used UTRACE_STOP and got a repeat callback, or just on the
 first time through without stopping), so those changes would happen only
 after your last callback.
 
 In the same vein, earlier engines (i.e. here called after yours) might
 use UTRACE_STOP after your first callback had every reason to believe it
 was the last one (i.e. that if did not hit).  In that case, you will get
 a repeat call (with UTRACE_SYSCALL_RESUMED flag).  On that call, you need
 to cope with the fact that you already did your entry tracing work before
 (but now things may have changed).  
 
 If the theory is that you want to respect your place in the engine order,
 whatever that is (i.e., if your tracing just reported a lie, it was the lie
 you were supposed to believe), then coping just means ignoring the
 repeat.  (This is no different in kind from an earlier engine/later
 callback changing the registers after your callback and never stopping.)
 
 For that you need to keep track of whether you already handled it or not.
 (Depending on your relative order and the actions of the other engines, you
 might get either UTRACE_STOP or UTRACE_SYSCALL_RESUMED either before or
 after you handled it.  So you can't use those alone.)  You can do this in
 two ways.  One is to use your own per-thread state (engine-data, etc.).
 The other is to disable the SYSCALL_ENTRY event when you've handled it, so
 you won't get more callbacks.  Then you can re-enable the event in your
 report_syscall_exit callback (or report_quiesce/report_signal, or whatever
 is most convenient to be sure you'll run before it goes back to user mode).
 i.e., use utrace_set_events() from the callbacks.

It sounds like disabling SYSCALL_ENTRY then re-enabling it in the
report_syscall_exit() callback is a reasonable way to go.

 This is understandable, but does hurt my head a *little* bit.  I think
 if you put the above full text somewhere and provided some examples this
 would make sense to people.
 
 The utrace-syscall-resumed branch puts this in the kerneldoc text for
 struct utrace_engine_ops (where callback return values and common arguments
 are described):
 
   * When %UTRACE_STOP is used in @report_syscall_entry, then @task
 + * stops before attempting the system call.  In this case, another
 + * @report_syscall_entry callback follows after @task resumes; in a
 + * second or later callback, %UTRACE_SYSCALL_RESUMED is set in the
 + * @action argument to indicate a repeat callback still waiting to
 + * attempt the same system call invocation.  This repeat callback
 + * gives each engine an opportunity to reexamine registers another
 + * engine might have changed while @task was held in %UTRACE_STOP.
 + *
 + * In other cases, the resume action does not take effect until @task
 + * is ready to check for signals and return to user mode.  If there
 + * are more callbacks to be made, the last round of calls determines
 + * the final action.  A @report_quiesce callback with @event zero, or
 + * a @report_signal callback, will always be the last one made before
 + * @task resumes.  Only %UTRACE_STOP is sticky--if @engine returned
 + * %UTRACE_STOP then @task stays stopped unless @engine returns
 + * different from a following callback.
 
 I don't know where the longer explanation and/or examples belong.
 Perhaps in a new section in utrace.tmpl?  We could start with putting
 together some text on the wiki.  Another idea is to add a few example
 modules in samples/utrace/.  

syscall tracing overheads: utrace vs. kprobes

2009-04-28 Thread Frank Ch. Eigler
Hi -

In a few contexts, it comes up as to whether it is faster to probe
process syscalls with kprobes or with something higher level such as
utrace.  (There are other hypothetical options too (per-syscall
tracepoints) that could be measured this way in the future.)  

It was time to check the intuitions about the overheads.  So, choosing
a syscall that won't get short-circuited via vdso:

% cat foo.c
#include unistd.h

int main ()
{
  unsigned c;
  for (c=0; c1000; c++)
 (void) close (1000);
}
% gcc foo.c

Now we compare these scenarios:

# stap -e 'probe never {}' -t --vp 1 -c a.out

Here, no actual probing occurs so we get a measurement of the plain
uninstrumented run time of ten million close(2)s.

# stap -e 'probe process.syscall {}' -t --vp 1 -c a.out

Here, we intercept sys_close with a kprobe.  If the system is not too
busy, we should pick up only the close(2)s coming from a.out, though a
few close(2)'s executed by other processes may show up.

# stap -e 'probe syscall.close {}' -t --vp 1 -c a.out

Here, we intercept all a.out's syscalls with utrace.  Other processes
are not affected at all, but other syscalls by a.out would be --
though in our test, there are hardly any of those.


Some typical results on my 2.66GHz 2*Xeon5150 machine runnin Fedora 9 -
2.6.27.12:

never:  
Pass 5: run completed in 740usr/3310sys/4155real ms.

kprobe: 
probe syscall.close (input:1:1), hits: 1028, cycles: 176min/202avg/3632max
Pass 5: run completed in 750usr/9320sys/10193real ms.

utrace: 
probe process.syscall (input:1:1), hits: 1025, cycles: 
176min/209avg/184392max
Pass 5: run completed in 1670usr/6860sys/8645real ms.

So utrace added 4.5 seconds, and kprobes added 6.0 seconds to the
uninstrumented 4.1 second run time.  But wait: we should subtract the
time taken by the probe handler itself: 200ish cycles at 2.66 GHz,
which is about 0.75 seconds.  So the overheads are approximately:

never: n/a
kprobe: 5.2 seconds = 0.52 us per hit
utrace: 3.6 seconds = 0.36 us per hit


Note that these are microbenchmarks that represent an ideal case
compared to a larger run, since they probably fit comfily inside
caches.  They probably also undercount the probe handler's run time.


- FChE



Re: [PATCH] Clarify UTRACE_ATTACH_EXCLUSIVE a bit more

2009-04-28 Thread Roland McGrath
Thanks.  I put that in but rewrote the new paragraph.

Roland



Re: syscall tracing overheads: utrace vs. kprobes

2009-04-28 Thread Roland McGrath
 Certainly, in general.  But in this specific test, only the under-test
 system calls occurred in essnetially the whole system, so the overhead
 measurements were in a way the bare minimums imposed by the kprobes
 vs. utrace callback infrastructure itself.

Yes.  That's why I meant to explain how these numbers are true but not
necessarily the numbers that matter.

  In real situations [...]  the distribution of overheads will be
  quite different.
 
 Or rather, the basic overhead quanta measured above may be multiplied
 along several different axes.

Yes.



Re: syscall tracing overheads: utrace vs. kprobes

2009-04-28 Thread David Smith
Frank Ch. Eigler wrote:
 Hi -
 
 In a few contexts, it comes up as to whether it is faster to probe
 process syscalls with kprobes or with something higher level such as
 utrace.  (There are other hypothetical options too (per-syscall
 tracepoints) that could be measured this way in the future.)  

These scenarios are a bit wrong:

 Now we compare these scenarios:
 
 # stap -e 'probe never {}' -t --vp 1 -c a.out
 
 Here, no actual probing occurs so we get a measurement of the plain
 uninstrumented run time of ten million close(2)s.

The above one is fine.

 # stap -e 'probe process.syscall {}' -t --vp 1 -c a.out
 
 Here, we intercept sys_close with a kprobe.  If the system is not too
 busy, we should pick up only the close(2)s coming from a.out, though a
 few close(2)'s executed by other processes may show up.
 
 # stap -e 'probe syscall.close {}' -t --vp 1 -c a.out
 
 Here, we intercept all a.out's syscalls with utrace.  Other processes
 are not affected at all, but other syscalls by a.out would be --
 though in our test, there are hardly any of those.

These 2 are swapped:  the 'process.syscall' probe is a utrace-based
probe and the 'syscall.close' probe is a kprobe-based probe.

Note that in the results, the description and probe types matched correctly.

 Some typical results on my 2.66GHz 2*Xeon5150 machine runnin Fedora 9 -
 2.6.27.12:
 
 never:  
 Pass 5: run completed in 740usr/3310sys/4155real ms.
 
 kprobe: 
 probe syscall.close (input:1:1), hits: 1028, cycles: 
 176min/202avg/3632max
 Pass 5: run completed in 750usr/9320sys/10193real ms.
 
 utrace: 
 probe process.syscall (input:1:1), hits: 1025, cycles: 
 176min/209avg/184392max
 Pass 5: run completed in 1670usr/6860sys/8645real ms.
 
 So utrace added 4.5 seconds, and kprobes added 6.0 seconds to the
 uninstrumented 4.1 second run time.  But wait: we should subtract the
 time taken by the probe handler itself: 200ish cycles at 2.66 GHz,
 which is about 0.75 seconds.  So the overheads are approximately:
 
 never: n/a
 kprobe: 5.2 seconds = 0.52 us per hit
 utrace: 3.6 seconds = 0.36 us per hit
 
 
 Note that these are microbenchmarks that represent an ideal case
 compared to a larger run, since they probably fit comfily inside
 caches.  They probably also undercount the probe handler's run time.

-- 
David Smith
dsm...@redhat.com
Red Hat
http://www.redhat.com
256.217.0141 (direct)
256.837.0057 (fax)