Re: The demise of utracer.
Roland McGrath wrote: Sorry to be blunt, Chris. But I think you're headed down a useless rat hole. I agree that the usage of /proc you've described is a bad interface. I am slightly mystified as to how that came to be what you settled on. At the time Andrew Cagney and I decided to go that route, it was because frysk had no means of effecting kernel changes other than by use of a loadable module and the use of /proc entries was a common, easily accessible, means of communicating with modules. I don't think it's worthwhile to hash over that. Let's move on. I'd love to, but it would be nice to have a clue as to which direction. All I'm getting from The World is a list of stuff I shouldn't be doing, and that helps not at all with regard to what I /should/ be doing. Please forget ptrace. Please forget about adding syscalls. At this point I think I just need you to give me the benefit of the doubt when I tell you I am sure this is not the way, and even dabbling sidetracks us from really useful progress. Let's move on. Again, ptrace hacks and new syscall hacks are things I can actually do and in the absence of any other clue concerning what I should be doing it's what I've been doing.--I know it's a been a near-total waste of my time, but it kinda beats staring at a blank screen all day. I'll be glad to give you the benefit of the doubt--you've been kernel hacking longer than I have--but if you have cool notions about which way to go, you kinda need to let the rest of us know what they are. (And, reading ahead, yeah, I know, that's what the rest of this note is...) I'll commence to hackin'. cm -- Chris Moller I know that you believe you understand what you think I said, but I'm not sure you realize that what you heard is not what I meant. -- Robert McCloskey signature.asc Description: OpenPGP digital signature
Re: The demise of utracer.
On Tue, Jul 01, 2008 at 03:28:04PM -0400, Chris Moller wrote: K.Prasad wrote: Hi All, Sorry if I have missed out something I need to know before I respond to this email. But the trace infrastructure (lib/trace.c) already provides such a facility which more features such as per-cpu buffer for faster transmission (it is a wrapper over relay which sits on top of debugfs). The interfaces provided by trace are much simpler/functional than setting up a debugfs interface manually (see samples/trace/fork_trace.c) and the directory structure and control files setup by trace are already familiar to the systemtap code. Thanks, K.Prasad P.S.: trace is currently in -mm tree. Thought it might be interesting to check this out--the patched 2.6.26-rc5 kernel built fine but panicked when I tried to boot it. So You might want to directly try out 2.6.26-rc5-mm1 directly. It boots fine on my T60p with default configs. Thanks, K.Prasad
Re: The demise of utracer.
K.Prasad wrote: On Tue, Jul 01, 2008 at 03:28:04PM -0400, Chris Moller wrote: K.Prasad wrote: Hi All, Sorry if I have missed out something I need to know before I respond to this email. But the trace infrastructure (lib/trace.c) already provides such a facility which more features such as per-cpu buffer for faster transmission (it is a wrapper over relay which sits on top of debugfs). The interfaces provided by trace are much simpler/functional than setting up a debugfs interface manually (see samples/trace/fork_trace.c) and the directory structure and control files setup by trace are already familiar to the systemtap code. Thanks, K.Prasad P.S.: trace is currently in -mm tree. Thought it might be interesting to check this out--the patched 2.6.26-rc5 kernel built fine but panicked when I tried to boot it. So You might want to directly try out 2.6.26-rc5-mm1 directly. It boots fine on my T60p with default configs. Hmmm. okay, I'l try that, thx. (i was using the -mm3 patch and an oldconfig from Fedora i686.) Thanks, K.Prasad -- Chris Moller I know that you believe you understand what you think I said, but I'm not sure you realize that what you heard is not what I meant. -- Robert McCloskey signature.asc Description: OpenPGP digital signature
Re: The demise of utracer.
So now we have four possibles: extending ptrace, a new ptrace-replacement syscall, a systemtap-based thing, and, now, ntrace. Things are diverging rather than converging. A couple of comments concerning the /proc entry fd method utracer uses and than some people have expressed a preference for: The first is that it's clumsy. When the utracer module loads, it creates a /proc pseudo-directory /proc/utrace and then a pseudo-file /proc/utrace/control. When an app wants access to utracer services it opens that control file and ioctl()s [1] a register request to it. This causes the module to create another pseudo-directory /proc/utrace/app-pid and two new pseudo-files /proc/utrace/app-pid/cmd and /proc/utrace/app-pid/resp. The app then has to open those two pseudo files, using asprintf() or some such to build the file name strings. This strikes me as a ridiculous amount of arm-waving. Further, /proc entries are meant to be publicly accessible ways of accessing kernel data and, to a lesser extent (possibly even including zero--I can't think of an example at the moment), services. utracer violates this implicit paradigm. The utracer entries under /proc/utrace are intended to be exclusively a means of communication between the registering app and the module--I even set the permissions to make sure of that--and might even violate the data vs. services thing. Further, using /proc entries requires that the module keep appropriate structs for each of those entries and to close the entries and clean up the structs when the requesting app de-registers. All cool except when the requesting app either crashes or the app writer just exits without bothering to de-register first--you can build up a lot of dead structs that way. utracer solves this problem by utrace-attaching an engine to every registering app just to get notification of it's unexpected demise and then run around cleaning up after it. All of the foregoing is by way of saying that the /proc entry fd method, even though it looks fairly cool, has it's problems and is fairly inefficient--not even including the unknown (to me, at least) inefficiencies of overhead in the read()/write()/ioctl() mechanism. This isn't to say, however, that the fd (hence select()/poll()/whatever) approach can't be made to work. I haven't even done five minutes of research on it, but it seems possible to me that a ptrace() extension request could be something like fd = ptrace(UTRACE_GET_FD, ...) (or equivalent new utrace syscall request) which returns an fd suitable for use by poll() and select(), and possibly readable, writable, and ioctl-able as well, without having an actual underlying pseudo-file and the attendant overhead and problems described above. [1] Most control operations in utracer are based on ioctl() rather than read() and write(). utracer supports a fair number of requests for information [2] but read() has no way of specifying which information the app is after and write() has no way of retrieving the information. ioctl() is open-ended in that regard and can specify in arbitrary detail not only what information the app wants but provide a pointer to a place to store complex results as well. The down side is that while simple mechanisms are documented by which to register read() and write() methods I couldn't find any documented mechanism by which to register an ioctl() method. I found a way that works: There's a struct associated with top level /proc entries that contains a pointer to another struct that contains a pointer, invariably null so far as I can tell, to an ioctl method. In other than top-level /proc entries, the pointer to the secondary struct is null so I make a copy of the top level struct, stuff in a pointer to my own ioctl() method, then point the original subordinate struct at the modified copy. This is so appallingly clumsy that either I'm very badly missing something or no one has ever needed to ioctl() to a /proc entry before and the mechanism to do so hasn't been developed. The scary part of my hack, of course, is that you never know when someone's going to change something and break it. [2] Information like, e.g. a task's mapped memory regions. This is available just by reading the right pseudo-file in /proc/pid, but when an app reads that file, the kernel waves its arms a while extracting data from the relevant struct mm_struct and struct vm_area_struct, formatting it into ascii strings, and sending it to the requesting app. The app then has to wave /its/ arms for a while parsing the ascii strings to get back exactly the same information the kernel had in the first place. This rather badly offends my engineer's sense of efficiency, so utracer provides a means of accessing such information directly, in binary, in a struct the app doesn't have to parse. [3] [3] Yes, I know [2] was a subordinate footnote to [1] and considered Bad Form by some.
Re: The demise of utracer.
Chris Moller [EMAIL PROTECTED] writes: [...] and than some people have expressed a preference for: The first is that it's clumsy. When the utracer module loads, it creates a /proc pseudo-directory /proc/utrace and then a pseudo-file /proc/utrace/control. [...] FWIW, it'd make more sense to me if such a file was per-process (under the /proc/$pid/) hierarchy. [...] Further, using /proc entries requires that the module keep appropriate structs for each of those entries and to close the entries and clean up the structs when the requesting app de-registers. All cool except when the requesting app either crashes or the app writer just exits without bothering to de-register first [...] This really should not be a problem. The kernel tells you when a file descriptor (such as /proc/$$/utrace) gets released, no matter the cause. No cooperation from the userspace clients is needed to unregister, just close() or die. - FChE
Re: The demise of utracer.
Hi, On Wed, Jun 25, 2008 at 8:39 PM, David Miller [EMAIL PROTECTED] wrote: From: Chris Moller [EMAIL PROTECTED] Date: Wed, 25 Jun 2008 19:11:57 -0400 I agree in principal, but there are years and years of old cruft in gdb too and I'm not altogether sure that separating the experience from the cruft is possible or, at least, any less work than just starting over and accumulating new cruft. For the current GDB sources, I respectfully disagree with you. It's about as clean as I've ever seen a debugger of it's size and number of targets supported. It's a work of art. In terms of backtraces generation, you could also look at libunwind approach, specially in cases where you need fast and reliable ways to catch backtraces from a remote process. []s -- Bruno de Oliveira Abinader Mobile Linux Software (MLS) / Instituto Nokia de Tecnologia (INdT) Tel: +55 92 21261068
Re: The demise of utracer.
On Tue, 2008-06-24 at 22:49 -0400, Chris Moller wrote: Comments, up to and including What the hell are you smoking? welcome. I'd rather phrase this as what are we not getting out of ptrace that we really want. Answering that will frame how we go about getting it. (I admit to not having looked closely at utracer yet, so I'm not sure how much this was done there.) Two features I would like to see, more for debugger performance than anything else, are direct API support for breakpoints and watchpoints. The former could be implemented as single-stepping entirely within the kernel and that'd be just fine, although running until illegal instruction trap would be even better. Watchpoint support might require the usual page protection tricks, which is admittedly non-trivial. Both of these seem like they slot in reasonably in the existing ptrace interface, but that's very much a non-expert's opinion. In terms of correctness it seems like at least gdb still has real trouble handling multiple threads and processes sanely, but that could easily be more about gdb's failings than those of ptrace. - ajax signature.asc Description: This is a digitally signed message part
Re: The demise of utracer.
On Wed, Jun 25, 2008 at 12:30:03PM -0400, Chris Moller wrote: Extending ptrace seems like a sad idea. If Linux is going to grow a new userspace-accessible debug interface, can't it go in /proc or something? Actually, that's exactly what utracer does. It's a module that creates an entries under /proc (/proc/utracer/*) that client apps can read()/write()/ioctl() to access utracer capabilities. Maybe the coolest thing about utracer is that it gave every app its own /proc entry that blocked on read() until an app-defined interesting thing happened: specified signals, task state changes, specified syscall entry/exit, all the stuff accessible through utrace report_* callbacks. So, how'd it demise in a way that a syscall interface would be any better? This sounds like the right way to do it (barring scaling details; maintining one fd per thread becomes impractical). Side note: every time someone talks about a ptrace replacement I suggest stealing one from Solaris :-) It seems one of the areas that Sun thought out properly, although in my limited brushes with it in the last year I'm becoming less convinced of that. -- Daniel Jacobowitz CodeSourcery
Re: The demise of utracer.
Phil Muldoon wrote: Chris Moller wrote: Due to a complete lack of interest, I'm shelving utracer. May it collect dust in peace. I was (and still am) interested in it from a direct api point of view. As a an end-tool not so much. Systemtap is cool, and, based on tinkering with it for a while, is great for finding deep, hairy, stuff, but doesn't appear to lend itself to highly interactive use in debuggers. When you mean a Systemtap like approach, what do you mean? I am doing poor service in describing Systemtap, but do you mean compiling a script/api into a utracer kernel module and aggregating data from that? If this were the approach would not an assembly of scripts be the debugger itself? I'm certainly no systemtap expert, but at least one limiting factor I see is that compiling and loading a kernel module takes time and resources way beyond the trivial--I can't imagine such an approach being workable in an interactive environment. You almost certainly couldn't do it an a per-user-interaction basis with anything like decent response time. The whole-new-syscall thing, at which I started some tentative hacking a while ago, was based on my aversion to hacking into Other People's Code, mostly kernel/ptrace.c in this case--I just thought it more expedient to be as unobtrusive as possible. At least one downside of this, of course, is the redundancy with ptrace. (And I suspect it might be difficult to coordinate getting a new syscall number universally accepted upstream, but I don't have much of a clue about that.) I am not sure how this would scale. It would be cool to see how it did in a basic prototype, if only to rule it out due to performance and scalability. I'm not sure what you ean by scale. My sandbox code added one new entry in unistd.h: __NR_utrace, 327, The redundancy I'm talking about is that ptrace() capabilities, stepping, peeking, poking, etc., would continue to exist in ptrace (in kernel/ptrace.c), but would be replicated as a subset of the capabilities of utrace() (actually, syscall(SYS_utrace,...) until libc caught up) in a hypothetical kernel/utrace_ui.c (Not so hypothetical, actually--that's what I called it.) That leaves, unless someone has a better idea, extending ptrace. I started a bit of tentative hacking on that yesterday. Turns out, actually, that it was easy to do that minimally intrusively. (It's about a ten-line patch to kernel/ptrace.c to add a default case to the ptrace_common() request switch that calls an external fcn to decode extensions to the PTRACE_* requests., plus a patch to include/linux/ptrace.h to add the new request number #defines. All the real hacking is in the external fcn.) At least one upside of this, of course, is that there's no redundancy with ptrace. I've no opinion on this, and it always tends to generate debate on why ptrace works, does not work, ptrace rules, ptrace sucks type arguments. I'm curious to why you think this approach is relevant/useful though? Is it from a maintainability viewpoint? Performance? Several things: extensions of ptrace can do arbitrary things completely unlike the capabilities of existing ptrace()--there'd be an an entire new range of requests extending the usual PTRACE_POKE*, PTRACE_PEEK*, etc., stuff to include, hypothetically, things like UTRACE_WAIT. (Again, not so hypothetical: I've already sandboxed that: it acts like a super, highly controllable, waitpid(), that unblocks on a call-defined subset of task state changes, signals, syscall entries/exits, etc.) It certainly is more maintainable than replicating ptrace() capability in a comprehensive utrace(), and, either as an extension of the ptrace syscall or as its own utrace syscall, it's got a lot better performance than the /proc entry approach--thee's a lot more overhead associated with read(), write(), and ioctl() than with a direct syscall. Thanks Phil -- Chris Moller I know that you believe you understand what you think I said, but I'm not sure you realize that what you heard is not what I meant. -- Robert McCloskey signature.asc Description: OpenPGP digital signature
Re: The demise of utracer.
Daniel Jacobowitz wrote: On Wed, Jun 25, 2008 at 12:30:03PM -0400, Chris Moller wrote: Extending ptrace seems like a sad idea. If Linux is going to grow a new userspace-accessible debug interface, can't it go in /proc or something? Actually, that's exactly what utracer does. It's a module that creates an entries under /proc (/proc/utracer/*) that client apps can read()/write()/ioctl() to access utracer capabilities. Maybe the coolest thing about utracer is that it gave every app its own /proc entry that blocked on read() until an app-defined interesting thing happened: specified signals, task state changes, specified syscall entry/exit, all the stuff accessible through utrace report_* callbacks. So, how'd it demise in a way that a syscall interface would be any better? This sounds like the right way to do it (barring scaling details; maintining one fd per thread becomes impractical). No, utracer only required two fds per client app: two instances of a gdb-replacement, e.g., would require a total of four fds, regardless of how many threads each gbd-thing was following. The module was desiged to accecpt client requests from any number of apps and assign each it's pair of fds, (One fd was for write()ing various things to the module to control operations and ioctl()ing, mostly to extract synchronous data. The other fd was a read-only that blocked pending user-defined interesting stuff, kinda like a super waitpid().) The main reason I'm moving away from this approach is that the overhead of read()/write()/ioctl() to/from a /proc pseudo-entry is a lot higher than a simple syscall. It's mostly a performance thing, but there's also less code tp maintain if I get rid of the /proc stuff. (Plus, I couldn't find any existing examples of anyone doing ioctl() to a /proc entry and I kinda had to invent the method myself. It works fine, but that may just be because I haven't found the right way to break it yet.) Side note: every time someone talks about a ptrace replacement I suggest stealing one from Solaris :-) It seems one of the areas that Sun thought out properly, although in my limited brushes with it in the last year I'm becoming less convinced of that. What's Solaris ptrace() do that Linux ptrace() doesn't? Nothing says I can't at least hack at putting it in. -- Chris Moller I know that you believe you understand what you think I said, but I'm not sure you realize that what you heard is not what I meant. -- Robert McCloskey signature.asc Description: OpenPGP digital signature
Re: The demise of utracer.
On Wed, Jun 25, 2008 at 04:11:50PM -0400, Chris Moller wrote: The main reason I'm moving away from this approach is that the overhead of read()/write()/ioctl() to/from a /proc pseudo-entry is a lot higher than a simple syscall. It's mostly a performance thing, but there's also less code tp maintain if I get rid of the /proc stuff. (Plus, I couldn't find any existing examples of anyone doing ioctl() to a /proc entry and I kinda had to invent the method myself. It works fine, but that may just be because I haven't found the right way to break it yet.) I'm curious, how can this be? The worst case I can see is two syscalls instead of one, but that's not a lot higher. You can use pread/pwrite to avoid having to seek; it's only an issue when you have a structured request with a detailed reply. Side note: every time someone talks about a ptrace replacement I suggest stealing one from Solaris :-) It seems one of the areas that Sun thought out properly, although in my limited brushes with it in the last year I'm becoming less convinced of that. What's Solaris ptrace() do that Linux ptrace() doesn't? Nothing says I can't at least hack at putting it in. Solaris's debug interface is based in proc. No ioctl; you write structured requests to specific files, instead. http://docs.sun.com/app/docs/doc/816-5174/proc-4?l=ena=view Its biggest advantage, in my experience so far, is that it is not entwined with wait and signals. Being attached doesn't mess up the debuggee's signals or its parent's wait behavior the way ptrace does. -- Daniel Jacobowitz CodeSourcery
Re: The demise of utracer.
From: Daniel Jacobowitz [EMAIL PROTECTED] Date: Wed, 25 Jun 2008 11:38:25 -0400 The single biggest pain in GDB's process management is dealing with signals, especially the ways that ptrace interferes with normal operation. Because of this, and other similar examples, I believe the only way to design a new debugging interface is to walk through a significant debugging tool like GDB and guide the interface design by what something like GDB is trying to accomplish. There are years and years of experience in debugging codified into a code base like GDB, and therefore the perfect place to mine interface guiding experience from.
Re: The demise of utracer.
David Miller wrote: From: Daniel Jacobowitz [EMAIL PROTECTED] Date: Wed, 25 Jun 2008 11:38:25 -0400 The single biggest pain in GDB's process management is dealing with signals, especially the ways that ptrace interferes with normal operation. Because of this, and other similar examples, I believe the only way to design a new debugging interface is to walk through a significant debugging tool like GDB and guide the interface design by what something like GDB is trying to accomplish. There are years and years of experience in debugging codified into a code base like GDB, and therefore the perfect place to mine interface guiding experience from. I agree in principal, but there are years and years of old cruft in gdb too and I'm not altogether sure that separating the experience from the cruft is possible or, at least, any less work than just starting over and accumulating new cruft. -- Chris Moller I know that you believe you understand what you think I said, but I'm not sure you realize that what you heard is not what I meant. -- Robert McCloskey signature.asc Description: OpenPGP digital signature