Re: The demise of utracer.

2008-06-30 Thread Chris Moller
So now we have four possibles: extending ptrace, a new 
ptrace-replacement syscall, a systemtap-based thing, and, now, ntrace.  
Things are diverging rather than converging.


A couple of comments concerning the /proc entry fd method utracer uses 
and than some people have expressed a preference for:  The first is that 
it's clumsy.  When the utracer module loads, it creates a /proc 
pseudo-directory /proc/utrace and then a pseudo-file 
/proc/utrace/control.  When an app wants access to utracer services it 
opens that control file and ioctl()s [1] a register request to it.  
This causes the module to create another pseudo-directory 
/proc/utrace/app-pid and two new pseudo-files 
/proc/utrace/app-pid/cmd and /proc/utrace/app-pid/resp.  The app 
then has to open those two pseudo files, using asprintf() or some such 
to build the file name strings.  This strikes me as a ridiculous amount 
of arm-waving.  

Further, /proc entries are meant to be publicly accessible ways of 
accessing kernel data and, to a lesser extent (possibly even including 
zero--I can't think of an example at the moment), services.  utracer 
violates this implicit paradigm.  The utracer entries under /proc/utrace 
are intended to be exclusively a means of communication between the 
registering app and the module--I even set the permissions to make sure 
of that--and might even violate the data vs. services thing.


Further, using /proc entries requires that the module keep appropriate 
structs for each of those entries and to close the entries and clean up 
the structs when the requesting app de-registers.  All cool except when 
the requesting app either crashes or the app writer just exits without 
bothering to de-register first--you can build up a lot of dead structs 
that way.  utracer solves this problem by utrace-attaching an engine to 
every registering app just to get notification of it's unexpected demise 
and then run around cleaning up after it.


All of the foregoing is by way of saying that the /proc entry fd method, 
even though it looks fairly cool, has it's problems and is fairly 
inefficient--not even including the unknown (to me, at least) 
inefficiencies of overhead in the read()/write()/ioctl() mechanism.


This isn't to say, however, that the fd (hence select()/poll()/whatever) 
approach can't be made to work.  I haven't even done five minutes of 
research on it, but it seems possible to me that a ptrace() extension 
request could be something like fd = ptrace(UTRACE_GET_FD, ...) (or 
equivalent new utrace syscall request) which returns an fd suitable for 
use by poll() and select(), and possibly readable, writable, and 
ioctl-able as well, without having an actual underlying pseudo-file and 
the attendant overhead and problems described above.




[1] Most control operations in utracer are based on ioctl() rather than 
read() and write().  utracer supports a fair number of requests for 
information [2] but  read() has no way of specifying which information 
the app is after and write() has no way of retrieving the information.  
ioctl() is open-ended in that regard and can specify in arbitrary detail 
not only what information the app wants but provide a pointer to a place 
to store complex results as well.  The down side is that while simple 
mechanisms are documented by which to register read() and write() 
methods I couldn't find any documented mechanism by which to register an 
ioctl() method.  I found a way that works: There's a struct associated 
with top level /proc entries that contains a pointer to another struct 
that contains a pointer, invariably null so far as I can tell, to an 
ioctl method.  In other than top-level /proc entries, the pointer to the 
secondary struct is null so I make a copy of the top level struct, stuff 
in a pointer to my own ioctl() method, then point the original 
subordinate struct at the modified copy.  This is so appallingly clumsy 
that either I'm very badly missing something or no one has ever needed 
to ioctl() to a /proc entry before and the mechanism to do so hasn't 
been developed.  The scary part of my hack, of course, is that you never 
know when someone's going to change something and break it.


[2]  Information like, e.g. a task's mapped memory regions.  This is 
available just by reading the right pseudo-file in /proc/pid, but when 
an app reads that file, the kernel waves its arms a while extracting 
data from the relevant struct mm_struct and struct vm_area_struct, 
formatting it into ascii strings, and sending it to the requesting app.  
The app then has to wave /its/ arms for a while parsing the ascii 
strings to get back exactly the same information the kernel had in the 
first place.  This rather badly offends my engineer's sense of 
efficiency, so utracer provides a means of accessing such information 
directly, in binary, in a struct the app doesn't have to parse. [3]


[3]  Yes, I know [2] was a subordinate footnote to [1] and considered 
Bad Form by some.  

Re: The demise of utracer.

2008-06-30 Thread Frank Ch. Eigler
Chris Moller [EMAIL PROTECTED] writes:

 [...]
 and than some people have expressed a preference for:  The first is
 that it's clumsy.  When the utracer module loads, it creates a /proc
 pseudo-directory /proc/utrace and then a pseudo-file
 /proc/utrace/control.  [...]

FWIW, it'd make more sense to me if such a file was per-process
(under the /proc/$pid/) hierarchy.

 [...] Further, using /proc entries requires that the module keep
 appropriate structs for each of those entries and to close the
 entries and clean up the structs when the requesting app
 de-registers.  All cool except when the requesting app either
 crashes or the app writer just exits without bothering to
 de-register first [...]

This really should not be a problem.  The kernel tells you when a file
descriptor (such as /proc/$$/utrace) gets released, no matter the
cause.  No cooperation from the userspace clients is needed to
unregister, just close() or die.

- FChE



Re: [Ksummit-2008-discuss] DTrace

2008-06-30 Thread Jim Keniston
On Mon, 2008-06-30 at 13:12 -0400, Frank Ch. Eigler wrote:
 Hi -
 
 On Mon, Jun 30, 2008 at 01:29:13PM +0200, Christoph Hellwig wrote:
 
  [...]  This might be getting a little offtopic for the kernel summit
  discuss list, but let's start anyway, we can move this to a better
  suited list, although I can't think of one except for linux-kernel.
 
 [EMAIL PROTECTED]
 utrace-devel@redhat.com
 
 
  I'm not sure if that's the current design, but I can't find any
  evidence in the code that it allows running handlers in process
  context, all that's available is a kernel callback.  [...]

To clarify, it's a kernel callback that runs in the context of the
probed thread -- like other utrace-based callbacks.  And like other
utrace-based callbacks, a uprobes handler can block for stuff like
copy_to/from_user()... although I believe systemtap will support only
non-blocking handlers for now.

 
 For systemtap's purposes, that is sufficient.  Our probes are meant to
 run non-intrusively (they do not mess with user thread scheduling,
 their VM state, strictly limited time  space consumption), so
 actually injecting equivalent snippets of code into userspace for
 execution there does not seem to buy anything.  Plus, like dtrace, we
 want scripts to be able to intermix probes (= share data) amongst
 kernel and multiple user-space threads, and this seems most naturally
 done by running the probes themselves in kernel space.

Yes.

 
 
  [...] What we really need is a userspace interface so that it
  actually can be used by thing like frysk or an implementation of the
  userspace dtrace hooks.

Userspace dtrace hooks could be probed using systemtap-generated
uprobes, whether or not the hooks all funnel into the same user-space
handler function.

 
 That will get built as other tools require it.  Systemtap per se will
 likely not.

Two years back, we explored the possibility of systemtap translating a
script into an ad hoc tracer app that used ptrace.  The idea was that
that would suffice in cases where the user doesn't care to see what's
going on in the kernel.  My experience was that ptrace wasn't up to the
task.  Perhaps when we nail down the right utrace-based,
ptrace-replacement system call interface (utracer II, or whatever -- see
the  current discussion on the utrace-devel list), we should revisit
that option.  It would make systemtap accessible to the ordinary
application programmer, without him/her needing root or stapdev to bless
his/her script.

Stuff that's in uprobes (e.g., kprobes-style single-stepping out of
line, to allow real-time tracing of multithreaded apps) can be made
available to the new syscall API and/or utrace.

 
 
  [...] For complex traces doing this in userspace is for sure a better idea.
 
 Can you elaborate upon this more complex scenario?
 
 
 - FChE

Jim Keniston