Re: linux-next: add utrace tree
On 02/07/2010 10:54 PM, Pavel Machek wrote: No, it has nothing to do with ring. It has to do with modifying code that another CPU could be executing at the same time, and with modifying code on the same processor through another virtual alias (they are different issues.) The same issues apply regardless of the CPL of the processor. ...but these are always 'there could be cpu bugs around' issues, right? Like amd k6. AFAICT x86 always supported self-modifying code without any extra barriers needed... *Self*-modifying code, yes. *Cross*-modifying code, no. -hpa
Re: linux-next: add utrace tree
On Mon, 8 Feb 2010 07:54:25 +0100 Pavel Machek pa...@ucw.cz wrote: No, it has nothing to do with ring. It has to do with modifying code that another CPU could be executing at the same time, and with modifying code on the same processor through another virtual alias (they are different issues.) The same issues apply regardless of the CPL of the processor. ...but these are always 'there could be cpu bugs around' issues, right? Like amd k6. AFAICT x86 always supported self-modifying code without any extra barriers needed... self modifying code yes, cross modifying code no. -- Arjan van de VenIntel Open Source Technology Centre For development, discussion and tips for power savings, visit http://www.lesswatts.org
Re: linux-next: add utrace tree
On 01/27/2010 01:05 PM, Ananth N Mavinakayanahalli wrote: We don't need to write one. I don't know how easy it is to make the kvm emulator less kvm-centric (vcpus, kvm_context, etc). Avi? It's a lot of mindless work but not too difficult; replacing hardcoded accessors with function pointers. -- error compiling committee.c: too many arguments to function
Re: linux-next: add utrace tree
Hi! Right, so you're going to love uprobes, which does exactly that. The current proposal is overwriting the target instruction with an INT3 and injecting an extra vma into the target process's address space containing the original instruction(s) and possible jumps back to the old code stream. Just out of interest, how does it handle the threading issue? Last I saw, at least some CPU people were _very_ nervous about overwriting instructions if another CPU might be just about to execute them. I think the issue was that ring 0 was never meant to do that, where as, ring 3 does it all the time. Doesn't the dynamic library modify its text? No, it has nothing to do with ring. It has to do with modifying code that another CPU could be executing at the same time, and with modifying code on the same processor through another virtual alias (they are different issues.) The same issues apply regardless of the CPL of the processor. ...but these are always 'there could be cpu bugs around' issues, right? Like amd k6. AFAICT x86 always supported self-modifying code without any extra barriers needed... -- (english) http://www.livejournal.com/~pavelmachek (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html
Re: linux-next: add utrace tree
On Fri, 2010-01-29 at 08:42 +0100, Ingo Molnar wrote: * Ananth N Mavinakayanahalli ana...@in.ibm.com wrote: On Thu, Jan 28, 2010 at 09:55:02AM +0100, Ingo Molnar wrote: ... Lets compare the two cases via a drawing. Your current uprobes submission does: [kernel] do probe thing single-step trap ^| ^ | |v | v [user] INT3XOL-ins next ins-stream ( add the need for serialization to make sure the whole single-step thing does not get out of sync with reality. ) And emulator approach would do: [kernel] emul-demux-fastpath, do probe thing ^ | | v [user] INT3 next ins-stream far simpler conceptually, and faster as well, because it's one kernel entry. Ingo, Yes, conceptually, emulation is simpler. In fact, it may even be the right thing to do from a housekeeping POV if gdb were enabled to use breakpoint assistance in the kernel. However... emulation is not easy. Just quoting Peter Anvin: On the more general rule of interpretation: I'm really concerned about having a bunch of partially-capable x86 interpreters all over the kernel. x86 is *hard* to emulate, and it will only get harder as the architecture evolves. -hpa This is obviously true for a full emulator. Except for the fact that: Yes, I know you suggested we start with a small subset. and for the fact that we already have emulators in the kernel. But this would be emulating userspace instructions, correct? The kernel is limited to what instructions it can perform, no floating point for example (of course there are some exceptions). But generally, the instructions in the kernel should be easier to emulate than in userspace. Userspace is free to do any wacky thing it wants. Will this limit the ability to probe apps that take advantage of some strange op code that the user knows is available on their platform? -- Steve Plus we _already_ need to decode instructions for safe kprobing and have the code for that upstream. So it's not like we can avoid decoding the instructions. (and emulating certain instruction patterns is really just a natural next step of a good decoder.)
Re: linux-next: add utrace tree
* Ananth N Mavinakayanahalli ana...@in.ibm.com wrote: On Fri, Jan 29, 2010 at 08:39:07AM +0100, Ingo Molnar wrote: ... When we merged kprobes ~10 years ago we made the (rather bad) mistake of merging a raw, opaque facility and leaving 'the rest' up to some other entity. IBM kprobes hackers vanished the day the original kprobes code went upstream and the high level entity never truly materialized in-kernel, for nearly a decade! I don't know what you are referring to here... Kprobes was merged in 2.6.9 (~August 2004 -- less than 6 years ago). [...] Ok, 6 years then :-) [...] Since then, we did work on ports to powerpc and s390. We implemented kretprobes. We made it much scalable using RCU; we did the powerpc booster to skip single-step when possible, not to mention various bug fixes over the years. Except it had no real in-kernel user. Yes, we did not do the perf integration, but perf did not exist then, either. Its simply wrong to say people 'vanished'. It has certainly was a bit stale for years - and with no real users that's certainly not a surprise. That has changed recently so i'm not complaining. We just dont want to repeat the same mistake with uprobes. Ingo
Re: linux-next: add utrace tree
On Fri, Jan 29, 2010 at 10:11:16AM +0100, Ingo Molnar wrote: * Ananth N Mavinakayanahalli ana...@in.ibm.com wrote: On Fri, Jan 29, 2010 at 08:39:07AM +0100, Ingo Molnar wrote: ... When we merged kprobes ~10 years ago we made the (rather bad) mistake of merging a raw, opaque facility and leaving 'the rest' up to some other entity. IBM kprobes hackers vanished the day the original kprobes code went upstream and the high level entity never truly materialized in-kernel, for nearly a decade! I don't know what you are referring to here... Kprobes was merged in 2.6.9 (~August 2004 -- less than 6 years ago). [...] Ok, 6 years then :-) [...] Since then, we did work on ports to powerpc and s390. We implemented kretprobes. We made it much scalable using RCU; we did the powerpc booster to skip single-step when possible, not to mention various bug fixes over the years. Except it had no real in-kernel user. Not that I want to rebut you Ingo, but there were in-kernel users since 2006 (net/ipv4/tcp_probe.c) :-) Aside, I am also glad that we have more flexibility with the perf integration. Ananth
Re: linux-next: add utrace tree
* Ananth N Mavinakayanahalli ana...@in.ibm.com wrote: On Fri, Jan 29, 2010 at 10:11:16AM +0100, Ingo Molnar wrote: * Ananth N Mavinakayanahalli ana...@in.ibm.com wrote: On Fri, Jan 29, 2010 at 08:39:07AM +0100, Ingo Molnar wrote: ... When we merged kprobes ~10 years ago we made the (rather bad) mistake of merging a raw, opaque facility and leaving 'the rest' up to some other entity. IBM kprobes hackers vanished the day the original kprobes code went upstream and the high level entity never truly materialized in-kernel, for nearly a decade! I don't know what you are referring to here... Kprobes was merged in 2.6.9 (~August 2004 -- less than 6 years ago). [...] Ok, 6 years then :-) [...] Since then, we did work on ports to powerpc and s390. We implemented kretprobes. We made it much scalable using RCU; we did the powerpc booster to skip single-step when possible, not to mention various bug fixes over the years. Except it had no real in-kernel user. Not that I want to rebut you Ingo, but there were in-kernel users since 2006 (net/ipv4/tcp_probe.c) :-) i said 'real' users. That usage in tcp_probe.c was (and is) really minimal and never expanded really. Aside, I am also glad that we have more flexibility with the perf integration. ok, good :) Ingo
Re: linux-next: add utrace tree
Ingo Molnar mi...@elte.hu writes: [...] So, to sum it up: utrace XOL, which is rather complex already, needs even more complexity (which is not yet implemented) than the much simpler common-case emulator approach i outlined, just to break even with the performance of the much simpler approach. [...] Is it an uncontroversial claim that emulation of CISC instructions should perform better than their native execution, followed by an int3 (as in the simplest working scheme) or boosting (as done by kprobes)? From my experience with simulators, simple software emulation of cpus can be hundreds of times slower or worse than native execution. - FChE
Re: linux-next: add utrace tree
* Jim Keniston jkeni...@us.ibm.com wrote: On Wed, 2010-01-27 at 09:54 +0100, Ingo Molnar wrote: ... I think the best solution for user probes (by far) is to use a simplified in-kernel instruction emulator for the few common probes instruction. (Kprobes already partially decodes x86 instructions to make it safe to apply accelerated probes and there's other decoding logic in the kernel too.) The design and practical advantages are numerous: - People want to probe their function prologues most of the time ... a single INT3 there will in most cases just hit the initial stack allocation and that's it. Yes, emulating push %ebp would buy us a lot of coverage for a lot of apps on x86 (but see below**). [...] Coverage in practice is all that matters. Consider the fact that i get 1000 times more bugreports aided by strace, which has 1000 times more overhead than even the slowest of uprobes approaches. This simple fact tell us that while performance matters, it is of little use if good utility and a clean design is not there. (in fact sane and clean design will almost automatically result in good performance too down the line, but i digress.) Faster crap is still crap. [...] Even there, though, we'd have to address the page fault we'd occasionally get when extending the stack vma. Nope, in the simplest model not even page fault emulation is needed, get_user()/put_user() would resolve it automatically. If you either get the value with the pagefault resolved, or you get a -EFAULT. If you concentrate only on the common case then emulation can be _really_ simple. Lets compare the two cases via a drawing. Your current uprobes submission does: [kernel] do probe thing single-step trap ^| ^ | |v | v [user] INT3XOL-ins next ins-stream ( add the need for serialization to make sure the whole single-step thing does not get out of sync with reality. ) And emulator approach would do: [kernel] emul-demux-fastpath, do probe thing ^ | | v [user] INT3 next ins-stream far simpler conceptually, and faster as well, because it's one kernel entry. Generally i get nervous if a piece of instrumentation cannot be expressed in simple ways. _Especially_ if i consider it to concentrate on all the wrong things and doesnt even break even with a far less complex scheme. What would be the 'right things' to concentrate on? Make sure it's all all around end-to-end package that is _useful to people_. As of today i have yet to get a _single_ bugreport or kernel improvement requested by an application writer who found out about the inefficiencies in his app using uprobes. There is a gaping hole of utility here, a whole cathedral of tools written that just a handful of ordinary Linux person uses. There's big disconnect and i can say one thing for sure: needless complexity in the wrong places can outright stiffle tools from becoming good. We could get quite good coverage (and very fast emulation) for the common case in not too much code - and much of that code we already have available. No re-trapping, As previously discussed, boosting would also get rid of the single-step trap for most instructions. Boosting is not in the uprobes patch-set you submitted. Even with it present it wont get rid of the initial INT3. So basically _best-case_ (with boosting) XOL-uprobes could roughly break even with a pure emulator approach ... That's a big and fundamental difference. no extra instruction patching x86_64 rip-relative instructions are the only ones we alter. and complex maintenance of trampolines. - It's as transparent as it gets - no user-space trampoline or other visible state that modifies behavior or can be stomped upon by user-space bugs. The XOL vma isn't writable from user space, so I can't think of how it could be clobbered merely by a stray memory reference. [...] Well there must be some purpose to the instrumentation, there must be some way to save data, right? If yes and it's in user-space, that data is clobberable. If it's in kernel-space then we have to enter the kernel anyway (with similar cost patterns to an INT3 entry) - so we just delayed the kernel entry. So IMHO you have designed in considerable complexity for little immediate benefit. [...] Yes, it's a vma that the unprobed app would never have; and yes, a malicious app or kernel module could remove it or alter the protection and scribble on it. We don't try to defend the app against such malicious attacks, but we do our best to ensure that the kernel side handles such attacks gracefully. - Lightweight and simple probe insertion: no weird setup sequence needing the
Re: linux-next: add utrace tree
On Mon, 2010-01-25 at 08:52 -0800, Linus Torvalds wrote: That said, I also suspect that people should still look seriously at simply just improving ptrace. For example, I suspect that the biggest problem with ptrace is really just the signalling, and that creating a new extension for JUST THAT, and then having a model where you can choose - at PTRACE_ATTACH time - how to wait for events would be a good thing. like returning a fd to poll() on ? :-) Cheers, Ben.
Re: linux-next: add utrace tree
On Fri, 29 Jan 2010, Benjamin Herrenschmidt wrote: like returning a fd to poll() on ? :-) Well, there's the possibility of async polling (rather than the synchronous wait that ptrace forces now), but there are other advantages to having a connection model - like not having to look up the child process every time like ptrace does now. Although 'find_task_by_vpid()' is probably cheap enough that nobody really cares. We do a fair job at those hash tables. Linus
Re: linux-next: add utrace tree
On Thu, 2010-01-28 at 09:55 +0100, Ingo Molnar wrote: * Jim Keniston jkeni...@us.ibm.com wrote: On Wed, 2010-01-27 at 09:54 +0100, Ingo Molnar wrote: ... Yes, emulating push %ebp would buy us a lot of coverage for a lot of apps on x86 (but see below**). [...] ... [...] Even there, though, we'd have to address the page fault we'd occasionally get when extending the stack vma. Nope, in the simplest model not even page fault emulation is needed, get_user()/put_user() would resolve it automatically. If you either get the value with the pagefault resolved, or you get a -EFAULT. get_user()/put_user() have to be done in a context where you can sleep, right? Uprobes currently operates in such contexts, but there's some talk of moving it all to a DIE_INT3 notifier context, where it can't sleep. ... We could get quite good coverage (and very fast emulation) for the common case in not too much code - and much of that code we already have available. No re-trapping, As previously discussed, boosting would also get rid of the single-step trap for most instructions. Boosting is not in the uprobes patch-set you submitted. Even with it present it wont get rid of the initial INT3. So basically _best-case_ (with boosting) XOL-uprobes could roughly break even with a pure emulator approach ... That's a big and fundamental difference. To be fair, wrt uprobes, emulation and boosting are both in the same state: pretty well understood, but not yet implemented. ... - It's as transparent as it gets - no user-space trampoline or other visible state that modifies behavior or can be stomped upon by user-space bugs. The XOL vma isn't writable from user space, so I can't think of how it could be clobbered merely by a stray memory reference. [...] Well there must be some purpose to the instrumentation, there must be some way to save data, right? If yes and it's in user-space, that data is clobberable. One or two others have advocated an approach (which eliminates the breakpoint trap) where trace data is stored in the uprobe vma, but I haven't. (In such a case, XOL vma would be a misnomer.) I agree that in such a scenario, the uprobe vma would of necessity be writable by the app. If it's in kernel-space then we have to enter the kernel anyway (with similar cost patterns to an INT3 entry) - so we just delayed the kernel entry. This seems to presume that you have to extract trace data from the kernel every time a probe is hit. In actual practice, you're often just checking for unusual arg values, incrementing a counter, or some such. ... Even if we add emulation, it seems sensible to keep the XOL approach as a backup to handle instructions that aren't yet emulated (and architectures that don't yet have emulators). That way, if you don't probe any unemulated instructions, the XOL vma is never created. To turn the argument around: an in-kernel emulator is an all-around facility to make sure we probe safely and securely, _and_ it is also more portable because it's simpler (because more gradual) to implement on a new architecture as you dont actually have to copy around instructions (and make sure they work in that new place), but have to emulate a limited subset of the instruction space, on purely local state. I understand the desire to start small and simple and grow gradually from there. We thought we were doing that. Single-stepping out of line has been in use for close to a decade, maybe more; and boosting (in kprobes) has been around for a few years as well. To the *probes folks, it feels pretty solid. ... With an emulator (assuming the emulator is correct) we can execute the precise semantics of that instruction in that place - without any side-effects from trampolining/replacement. And of course, our view has been that the best way to achieve the effect of the instruction, including all desired side-effects, is to execute the instruction on the CPU. ... **In practice, we've had to probe all sorts of instructions, including FP instructions -- especially where you want to exploit the debug info to get the names, types, and locations of variables and args. For some compilers and architectures, the debug info isn't reliable until the end of the function prologue, at which point you could find any old instruction. Ditto if you want to probe statements within a function. For those cases, frankly, the right approach is to fix the debug info (or introduce a new one) and forget the old crap. You treat debuginfo as some god-given property, while it's one of the suckiest aspects of all of Linux. But we've had that discussion months (and years) ago. It has improved in gcc 4.5 so there's some hope. Yes, there seems to be considerable movement toward better debug info -- which could make statement probing (and not just
Re: linux-next: add utrace tree
On Thu, Jan 28, 2010 at 09:55:02AM +0100, Ingo Molnar wrote: ... Lets compare the two cases via a drawing. Your current uprobes submission does: [kernel] do probe thing single-step trap ^| ^ | |v | v [user] INT3XOL-ins next ins-stream ( add the need for serialization to make sure the whole single-step thing does not get out of sync with reality. ) And emulator approach would do: [kernel] emul-demux-fastpath, do probe thing ^ | | v [user] INT3 next ins-stream far simpler conceptually, and faster as well, because it's one kernel entry. Ingo, Yes, conceptually, emulation is simpler. In fact, it may even be the right thing to do from a housekeeping POV if gdb were enabled to use breakpoint assistance in the kernel. However... emulation is not easy. Just quoting Peter Anvin: On the more general rule of interpretation: I'm really concerned about having a bunch of partially-capable x86 interpreters all over the kernel. x86 is *hard* to emulate, and it will only get harder as the architecture evolves. -hpa Yes, I know you suggested we start with a small subset. We already have an implementation of instruction emulation in kernel for x86 and powerpc, but its too KVM centric. If there is a generic emulation layer, we would use it. There are conflicting opinions for either case; complicated as it is, the XOL scheme works and, to a large extent, it is easily extendable to other architectures compared to the emulation approach. Uprobes can be made to use emulation when possible/available, but I don't think this should be gating decision for the initial implementation of the feature. Ananth
Re: linux-next: add utrace tree
* Peter Zijlstra pet...@infradead.org wrote: On Tue, 2010-01-26 at 15:37 -0800, Linus Torvalds wrote: On Tue, 26 Jan 2010, Tom Tromey wrote: In non-stop mode (where you can stop one thread but leave the others running), gdb wants to have the breakpoints always inserted. So, something must emulate the displaced instruction. I'm almost totally uninterested in breakpoints that actually re-write instructions. It's impossible to do that efficiently and well, especially in threaded environments. So if you do instruction rewriting, I can only say that's your problem. Right, so you're going to love uprobes, which does exactly that. The current proposal is overwriting the target instruction with an INT3 and injecting an extra vma into the target process's address space containing the original instruction(s) and possible jumps back to the old code stream. I'm all in favor of not doing that extra vma and instead use stack or TLS space, but then people complain about having to make that executable (which is something I don't really mind, x86 had executable everything for very long, and also, its only so when debugging the thing anyway). I think the best solution for user probes (by far) is to use a simplified in-kernel instruction emulator for the few common probes instruction. (Kprobes already partially decodes x86 instructions to make it safe to apply accelerated probes and there's other decoding logic in the kernel too.) The design and practical advantages are numerous: - People want to probe their function prologues most of the time ... a single INT3 there will in most cases just hit the initial stack allocation and that's it. We could get quite good coverage (and very fast emulation) for the common case in not too much code - and much of that code we already have available. No re-trapping, no extra instruction patching and complex maintenance of trampolines. - It's as transparent as it gets - no user-space trampoline or other visible state that modifies behavior or can be stomped upon by user-space bugs. - Lightweight and simple probe insertion: no weird setup sequence needing the stopping of all tasks to install the trampoline. We just add the INT3 and off you go. - Emulation is evidently thread-safe, SMP-safe, etc. as it only acts on task local state. - The points we can probe are never truly limited as it's all freely upscalable: if you cannot probe an instruction you want to probe today, extend the emulator. Deny the rest. _All_ versions of uprobes code i've seen so far already restricts the probe-compatible instruction set: RIP-relative instructions are excluded on 64-bit for example. - Emulation has the _least_ semantical side effects as we really execute 'that' instruction - not some other instruction put elsewhere into a special vma or into the process/thread stack, or some special in-kernel trampoline, etc. - Emulation can be very fast for the common case as well. Nobody will probe weird, complex instructions. They will use 'perf probe' to insert probes into their functions 90% of the time ... - FPU and complex ops and pagefault emulation is not really what i'd expect to be necessary for simple probing - but it _can_ be added by people who care about it, if they so wish. Such a scheme would be _far_ more preferable form a maintenance POV as well, as the initial code will be small, and we can extend it gradually. All the other proposals are complex 'all or nothing' schemes with no flexibility for complexity at all. Thanks, Ingo
Re: linux-next: add utrace tree
On Wed, 27 Jan 2010, Peter Zijlstra wrote: Right, so you're going to love uprobes, which does exactly that. The current proposal is overwriting the target instruction with an INT3 and injecting an extra vma into the target process's address space containing the original instruction(s) and possible jumps back to the old code stream. Just out of interest, how does it handle the threading issue? Last I saw, at least some CPU people were _very_ nervous about overwriting instructions if another CPU might be just about to execute them. Even the overwrite only the first byte with 'int3' made them go umm, I need to talk to some core CPU people to see if that's ok. They mumble about possible CPU errata, I$ coherency, instruction retry etc. I realize kprobes does this very thing, but kprobes is esoteric stuff and doesn't have much choice. In user space, you _could_ do the modification on a different physical page and then just switch the page table entry instead, and not get into the whole D$/I$ coherency thing at all. Linus
Re: linux-next: add utrace tree
On Wed, 2010-01-27 at 02:43 -0800, Linus Torvalds wrote: On Wed, 27 Jan 2010, Peter Zijlstra wrote: Right, so you're going to love uprobes, which does exactly that. The current proposal is overwriting the target instruction with an INT3 and injecting an extra vma into the target process's address space containing the original instruction(s) and possible jumps back to the old code stream. Just out of interest, how does it handle the threading issue? Last I saw, at least some CPU people were _very_ nervous about overwriting instructions if another CPU might be just about to execute them. Even the overwrite only the first byte with 'int3' made them go umm, I need to talk to some core CPU people to see if that's ok. They mumble about possible CPU errata, I$ coherency, instruction retry etc. I realize kprobes does this very thing, but kprobes is esoteric stuff and doesn't have much choice. In user space, you _could_ do the modification on a different physical page and then just switch the page table entry instead, and not get into the whole D$/I$ coherency thing at all. Right, so there's two aspects: 1) concurrency when inserting the probe 2) concurrency when hitting the probe 1) used to be dealt with by using utrace to stop all threads in the process and then writing the instruction. I suggested to CoW the page, modify the instruction, set the pagetable and flush tlbs at full speed -- the very thing you suggest here. 2) so traditionally (and the intel arch manual describes this) is to replace the instruction, single step it, and write the probe back. This is racy for multi-threading. The current uprobes stuff solves this by doing single-step-out-of-line (XOL). XOL injects a new vma into the target process and puts the old instruction there, then it single steps on the new location, leaving the original site with INT3. This doesn't work for things like RIP relative instructions, so uprobes considers them un-probable. Also, I myself really object to inserting a vma in a running process, its like a land-lord, sure he has the key but he won't come in an poke through your things. The alternative is to place the instruction in TLS or stack space, since each thread can only have a single trap at a time, you only need space for 1 instruction (plus a possible jump out to the original site). There is the 'problem' of marking the TLS/stack executable when being probed. Then there is the whole emulation angle, the uprobes people basically say its too much effort to write a x86 emulator.
Re: linux-next: add utrace tree
On Wed, 2010-01-27 at 11:55 +0100, Peter Zijlstra wrote: Right, so there's two aspects: 1) concurrency when inserting the probe 2) concurrency when hitting the probe 1) used to be dealt with by using utrace to stop all threads in the process and then writing the instruction. I suggested to CoW the page, modify the instruction, set the pagetable and flush tlbs at full speed -- the very thing you suggest here. Also, since executable maps are typically MAP_PRIVATE, you have to CoW anyway in order to modify it and I would exclude MAP_SHARED from being probable because then the modification could seep through into whatever was backing that thing.
Re: linux-next: add utrace tree
On Wed, Jan 27, 2010 at 11:55:16AM +0100, Peter Zijlstra wrote: On Wed, 2010-01-27 at 02:43 -0800, Linus Torvalds wrote: On Wed, 27 Jan 2010, Peter Zijlstra wrote: Right, so you're going to love uprobes, which does exactly that. The current proposal is overwriting the target instruction with an INT3 and injecting an extra vma into the target process's address space containing the original instruction(s) and possible jumps back to the old code stream. Just out of interest, how does it handle the threading issue? Last I saw, at least some CPU people were _very_ nervous about overwriting instructions if another CPU might be just about to execute them. Even the overwrite only the first byte with 'int3' made them go umm, I need to talk to some core CPU people to see if that's ok. They mumble about possible CPU errata, I$ coherency, instruction retry etc. I realize kprobes does this very thing, but kprobes is esoteric stuff and doesn't have much choice. In user space, you _could_ do the modification on a different physical page and then just switch the page table entry instead, and not get into the whole D$/I$ coherency thing at all. Right, so there's two aspects: 1) concurrency when inserting the probe 2) concurrency when hitting the probe 1) used to be dealt with by using utrace to stop all threads in the process and then writing the instruction. I suggested to CoW the page, modify the instruction, set the pagetable and flush tlbs at full speed -- the very thing you suggest here. 2) so traditionally (and the intel arch manual describes this) is to replace the instruction, single step it, and write the probe back. This is racy for multi-threading. The current uprobes stuff solves this by doing single-step-out-of-line (XOL). XOL injects a new vma into the target process and puts the old instruction there, then it single steps on the new location, leaving the original site with INT3. This doesn't work for things like RIP relative instructions, so uprobes considers them un-probable. Probing RIP-relative instructions work just fine; there are fixups that take care of it. Also, I myself really object to inserting a vma in a running process, its like a land-lord, sure he has the key but he won't come in an poke through your things. The alternative is to place the instruction in TLS or stack space, since each thread can only have a single trap at a time, you only need space for 1 instruction (plus a possible jump out to the original site). There is the 'problem' of marking the TLS/stack executable when being probed. Then there is the whole emulation angle, the uprobes people basically say its too much effort to write a x86 emulator. We don't need to write one. I don't know how easy it is to make the kvm emulator less kvm-centric (vcpus, kvm_context, etc). Avi? Ananth
Re: linux-next: add utrace tree
On Wed, 27 Jan 2010, Peter Zijlstra wrote: Right, so there's two aspects: 1) concurrency when inserting the probe That's the one I worried about. Stopping all threads will fix it, obviously at a disastrous performance cost, but what do I care? As noted, there are ways to do it safely with TLB switching, so it's fixable. 2) concurrency when hitting the probe Yeah, I didn't worry about this part, since the only solution is the out-of-line one, and I don't much care how the memory gets allocated for it. Inserting a whole new vma seems pretty drastic, but compared to stopping all threads, it's a small thing. Linus
Re: linux-next: add utrace tree
On Wed, 2010-01-27 at 16:35 +0530, Ananth N Mavinakayanahalli wrote: Probing RIP-relative instructions work just fine; there are fixups that take care of it. Ah my bad then, it was my understanding you simply bailed on those. Just for my information, how large are the replacement sequences?
Re: linux-next: add utrace tree
On Wed, Jan 27, 2010 at 12:08:31PM +0100, Peter Zijlstra wrote: On Wed, 2010-01-27 at 16:35 +0530, Ananth N Mavinakayanahalli wrote: Probing RIP-relative instructions work just fine; there are fixups that take care of it. Ah my bad then, it was my understanding you simply bailed on those. Just for my information, how large are the replacement sequences? The RIP relative instruction is transformed into indirect addressing mode using a scratch register. For details http://marc.info/?l=linux-kernelm=126401936114639w=2. Ananth
Re: linux-next: add utrace tree
[ Added Arjan ] On Wed, 2010-01-27 at 02:43 -0800, Linus Torvalds wrote: On Wed, 27 Jan 2010, Peter Zijlstra wrote: Right, so you're going to love uprobes, which does exactly that. The current proposal is overwriting the target instruction with an INT3 and injecting an extra vma into the target process's address space containing the original instruction(s) and possible jumps back to the old code stream. Just out of interest, how does it handle the threading issue? Last I saw, at least some CPU people were _very_ nervous about overwriting instructions if another CPU might be just about to execute them. I think the issue was that ring 0 was never meant to do that, where as, ring 3 does it all the time. Doesn't the dynamic library modify its text? -- Steve Even the overwrite only the first byte with 'int3' made them go umm, I need to talk to some core CPU people to see if that's ok. They mumble about possible CPU errata, I$ coherency, instruction retry etc. I realize kprobes does this very thing, but kprobes is esoteric stuff and doesn't have much choice. In user space, you _could_ do the modification on a different physical page and then just switch the page table entry instead, and not get into the whole D$/I$ coherency thing at all. Linus
Re: linux-next: add utrace tree
On 01/27/2010 02:43 AM, Linus Torvalds wrote: On Wed, 27 Jan 2010, Peter Zijlstra wrote: Right, so you're going to love uprobes, which does exactly that. The current proposal is overwriting the target instruction with an INT3 and injecting an extra vma into the target process's address space containing the original instruction(s) and possible jumps back to the old code stream. Just out of interest, how does it handle the threading issue? Last I saw, at least some CPU people were _very_ nervous about overwriting instructions if another CPU might be just about to execute them. Even the overwrite only the first byte with 'int3' made them go umm, I need to talk to some core CPU people to see if that's ok. They mumble about possible CPU errata, I$ coherency, instruction retry etc. We actually went through a review of that here at Intel. We do not yet have an *official* answer (in order for us to have that we have to have it approved by the architecture committee and published in the SDM), but to the best of our current knowledge (and I'm allowed to say this) the int3 method followed by global IPIs should be safe for modifying *one (atomic) instruction*. This is a specific case of a more general rule, but I don't want to disclose the whole rule until it has been officially approved. I realize kprobes does this very thing, but kprobes is esoteric stuff and doesn't have much choice. In user space, you _could_ do the modification on a different physical page and then just switch the page table entry instead, and not get into the whole D$/I$ coherency thing at all. On the more general rule of interpretation: I'm really concerned about having a bunch of partially-capable x86 interpreters all over the kernel. x86 is *hard* to emulate, and it will only get harder as the architecture evolves. -hpa
Re: linux-next: add utrace tree
On Wed, 2010-01-27 at 09:54 +0100, Ingo Molnar wrote: ... I think the best solution for user probes (by far) is to use a simplified in-kernel instruction emulator for the few common probes instruction. (Kprobes already partially decodes x86 instructions to make it safe to apply accelerated probes and there's other decoding logic in the kernel too.) The design and practical advantages are numerous: - People want to probe their function prologues most of the time ... a single INT3 there will in most cases just hit the initial stack allocation and that's it. Yes, emulating push %ebp would buy us a lot of coverage for a lot of apps on x86 (but see below**). Even there, though, we'd have to address the page fault we'd occasionally get when extending the stack vma. We could get quite good coverage (and very fast emulation) for the common case in not too much code - and much of that code we already have available. No re-trapping, As previously discussed, boosting would also get rid of the single-step trap for most instructions. no extra instruction patching x86_64 rip-relative instructions are the only ones we alter. and complex maintenance of trampolines. - It's as transparent as it gets - no user-space trampoline or other visible state that modifies behavior or can be stomped upon by user-space bugs. The XOL vma isn't writable from user space, so I can't think of how it could be clobbered merely by a stray memory reference. Yes, it's a vma that the unprobed app would never have; and yes, a malicious app or kernel module could remove it or alter the protection and scribble on it. We don't try to defend the app against such malicious attacks, but we do our best to ensure that the kernel side handles such attacks gracefully. - Lightweight and simple probe insertion: no weird setup sequence needing the stopping of all tasks to install the trampoline. We just add the INT3 and off you go. FWIW, we don't stop all threads to set up or extend the XOL vma, which is typically a one-time event. We just grab a mutex, in case multiple threads hit previously-unhit probepoints simultaneously, and simultaneously decide that the XOL area needs to be created or extended. - Emulation is evidently thread-safe, SMP-safe, etc. as it only acts on task local state. The posted uprobes implementation is, so far as we can tell through code inspection and testing, also thread-safe and SMP-safe. - The points we can probe are never truly limited as it's all freely upscalable: if you cannot probe an instruction you want to probe today, extend the emulator. I don't see how ripping out existing support for almost* the entire instruction set, and then putting it back instruction by instruction, patch by patch, is a win. Even if we add emulation, it seems sensible to keep the XOL approach as a backup to handle instructions that aren't yet emulated (and architectures that don't yet have emulators). That way, if you don't probe any unemulated instructions, the XOL vma is never created. Deny the rest. _All_ versions of uprobes code i've seen so far already restricts the probe-compatible instruction set: *Yes, we currently decline to probe some instructions that look troublesome and we haven't taken the time to test. These include things like privileged instructions, int*, in*/out*, and instructions that fuss with the segment registers. We've never actually seen such instructions in user apps. RIP-relative instructions are excluded on 64-bit for example. No. As discussed in previous posts, we handle rip-relative instructions. - Emulation has the _least_ semantical side effects as we really execute 'that' instruction - It seems to me that emulation is the only approach that DOESN'T execute the probed instruction. not some other instruction put elsewhere into a special vma or into the process/thread stack, or some special in-kernel trampoline, etc. - Emulation can be very fast for the common case as well. Nobody will probe weird, complex instructions. They will use 'perf probe' to insert probes into their functions 90% of the time ... - FPU and complex ops and pagefault emulation is not really what i'd expect to be necessary for simple probing - but it _can_ be added by people who care about it, if they so wish. **In practice, we've had to probe all sorts of instructions, including FP instructions -- especially where you want to exploit the debug info to get the names, types, and locations of variables and args. For some compilers and architectures, the debug info isn't reliable until the end of the function prologue, at which point you could find any old instruction. Ditto if you want to probe statements within a function. Such a scheme would be _far_ more preferable form a maintenance POV as well, as the initial code will be small, and we can extend it gradually. All the
Re: linux-next: add utrace tree
On Fri 2010-01-22 08:43:18, valdis.kletni...@vt.edu wrote: On Fri, 22 Jan 2010 10:51:39 +0530, Ananth N Mavinakayanahalli said: FWIW, Oleg's implementation of ptrace over utrace is 100% compatible with legacy ptrace; gdb testsuite indicates that (http://lkml.org/lkml/2009/12/21/98). No, that only proves it's compatible enough for gdb to not care. The problem is all those *other* packages that abuse ptrace in totally crackhead ways. (No, I can't name them - but ptrace is the sort of interface that almost encourages its use for things somewhere between crackhead and mad-scientist, so they're almost certainly out there.. WAY out there.. :) strace, subterfugue, ltrace, ...? Plus various homegrown sandboxing tools... Pavel -- (english) http://www.livejournal.com/~pavelmachek (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html
Re: linux-next: add utrace tree
On Mon, Jan 25, 2010 at 01:41:57PM -0800, Linus Torvalds wrote: On Mon, 25 Jan 2010, Tom Tromey wrote: ... * Support displaced stepping in the kernel; I think this would improve performance when debugging in non-stop mode. Don't we already do that at least on x86? Just doing a single-step should work on an instruction even if it has a breakpoint on it, because we set the TF bit. Or maybe I'm not understanding what displaced stepping means to you. If Tom is referring to supporting single-stepping out of line, ie., not putting back the original instruction at the bp location, yes, we already support it on various architectures for kernel breakpoints, through the kprobes infrastructure. For userspace, there are more complications to take care of. We are reworking a prototype based on community comments (see the long UBP/XOL thread on lkml from a few days ago). Hopefully the userspace breakpoint assistance layer will be generic enough for gdb to also take advantage of, though the interface details need to be hashed out. Ananth
Re: linux-next: add utrace tree
Hi - On Mon, Jan 25, 2010 at 02:05:54PM -0700, Tom Tromey wrote: [...] Nevertheless, if the Linux kernel were to present a new user-space API, and if it had an advantage over ptrace, then we would port GDB to use it. There are other platforms where, IIRC, we now use some /proc thing instead of ptrace. There are definitely things we would like from such an API. Here's a few I can think of immediately, there are probably others. * Use an fd, not SIGCHLD+wait, to report inferior state changes to gdb. [...] Relatedly, don't mess with the inferior's parentage. This is satisfied by the gdbstub prototype. * Support displaced stepping in the kernel [...] I believe this is tantamount to hardware breakpoint support, which is already present (via optional uprobes). * Support some kind of breakpoint expression in the kernel; this would improve performance of conditional breakpoints. Perhaps the existing gdb agent expressions could be used. This is in the todo list. And that KILLER FEATURE of running strace plus gdb on the same process? It *already works* with the gdbstub, and unmodified strace + gdb, thanks to utrace multiplexing process control. It is still artificially restricted in many ways, but this sort of thing is ready for testing: % process [1] % strace -o FILE -p % gdb process (gdb) target remote /proc//gdb (gdb) backtrace (gdb) cont (gdb) ^D % [process continues] % cat FILE [...] % kill - FChE
Re: linux-next: add utrace tree
On Mon, Jan 25, 2010 at 04:07:21PM -0800, Linus Torvalds wrote: On Tue, 26 Jan 2010, Renzo Davoli wrote: The solution is that everybody can code his/her optimized kernel/user interface for tracing in his/her kernel module, i.e. utrace. I don't think people understand. That is simply not a solution. That is a PROBLEM. The thing you describe is an absolute disaster. Which is exactly why I rant against it. The last thing we want to have is here, take this, and make your own kernel module mess around it optimized for your particular crazy scenario. But every SINGLE post in this thread that has argued for utrace has argued exactly this way. I haven't followed much of the utrace discussions, but my impression was that utrace primarily is a cleanup effort, replacing don't change it, you might break it code with a clean, well defined (and even documented) implementation. To make it easier for people not familiar with the low-level architecture details to experiment with debugging stuff. Two points to consider: 1. If you'd merge utrace + ptrace-on-utrace, but never anything else which uses the utrace API, wouldn't it still be an improvement? 2. A well defined utrace API makes debugging code more hackable, thus more likely that someone might come up with a brilliant killer debug feature in the future. (This might sound lame, but there are already a few people doing crazy things with utrace while I'm not aware that people have done such experiments based on the current ptrace impl.) BTW, the ptrace improvements discussed elsewhere in this thread (like using an fd intead of signals/wait) are orthogonal to utrace, no? IMHO it's a seperate discussion. Johannes
Re: linux-next: add utrace tree
On Tue, 26 Jan 2010, Johannes Stezenbach wrote: 1. If you'd merge utrace + ptrace-on-utrace, but never anything else which uses the utrace API, wouldn't it still be an improvement? I already said earlier that I'd be perfectly happy to merge utrace code, as long as it was clear that I'm not merging a platform for crazy work. IOW, the end result might be merging 99% of the code, but I want to set peoples _expectations_ right. I'm not at all interested in merging stuff that has various exported helper functions for people doing random things, but I could happily merge stuff that cleans up internal implementation. 2. A well defined utrace API makes debugging code more hackable, thus more likely that someone might come up with a brilliant killer debug feature in the future. I don't really agree. Clean code makes things easier to improve, and maybe utrace cleans thigns up. But defining new API's makes me very worried, and quite frankly, the last thing I ever want to see is a new interface that out-of-tree modules starr using for random hacking. So I'd be much happier without the whole utrace kernel interface and callbacks, and very much would want to avoid the whole issue of plugins. I'd like to see ptrace improvements - not something else. In other words, I'd much much rather keep the utrace thing _internal_ to ptrace. If people have performance complaints about ptrace, let's look at fixing those _as_such_, rather than look at new modules etc. BTW, the ptrace improvements discussed elsewhere in this thread (like using an fd intead of signals/wait) are orthogonal to utrace, no? IMHO it's a seperate discussion. Largely, yes. Tied together to some degree of course, but the whole issue of code cleanup can be seen as a reasonably independent first step (while moving to a fd-based interface should probably not be done without some cleanup first, so they _are_ somewhat tied together). Linus
Re: linux-next: add utrace tree
Tom Tromey tro...@redhat.com writes: * Use an fd, not SIGCHLD+wait, to report inferior state changes to gdb. Internally we're already using a self-pipe to integrate this into gdb's main loop. Relatedly, don't mess with the inferior's parentage. How would having a kernel based solution be better over your user space simulation? BTW there's the new signalfd() system call that might do it (haven't checked if it works for SIGCHLD) * Support displaced stepping in the kernel; I think this would improve performance when debugging in non-stop mode. Not sure what displaced stepping is exactly, but it sounds like the branch tracing extensions that got added a few releases ago? On modern Intel chips they give you a branch buffer in memory. -Andi -- a...@linux.intel.com -- Speaking for myself only.
Re: linux-next: add utrace tree
On Tue, 26 Jan 2010, Andi Kleen wrote: Tom Tromey tro...@redhat.com writes: * Use an fd, not SIGCHLD+wait, to report inferior state changes to gdb. Internally we're already using a self-pipe to integrate this into gdb's main loop. Relatedly, don't mess with the inferior's parentage. How would having a kernel based solution be better over your user space simulation? Oh, the reason we should do something in the kernel is that you really can't do certain things with the ptrace() interface. For example, think about how Wine and UML use ptrace - and then realize that that makes it impossible to attach a debugger from the outside. That's a real deficiency in ptrace - much more so than the fact that there are some odd details (ie the whole read/write a word at a time is just a quirky detail in comparison - not a fundamental problem). BTW there's the new signalfd() system call that might do it (haven't checked if it works for SIGCHLD) No, you miss the point. The problem isn't that you want to turn signals into a file descriptor just because you like file descriptors. The problem is that anything that is based on reparenting and signals is fundamentally a one parent only kind of interface. See? So the reason I think using an fd is a good idea is _not_ because gdb already uses an fd internally, but because it gives you a connection between the debugger and debuggee that is not fundamentally limited to a single controller. (It doesn't have to be a file descriptor, of course, but could be any kind of other model that allows multiple connections. It's just that in unix terms, using a file descriptor as the cookie for the connection is a very natural model. So the important part isn't the file descriptor itself, it's the model you could build). Linus
Re: linux-next: add utrace tree
The problem is that anything that is based on reparenting and signals is fundamentally a one parent only kind of interface. See? I was actually thinking about that before I wrote the email. But when I did that i couldn't come up with a good scenario where multiple debuggers actually make sense. In a sense being a debugger is really a very intimate thing for process. Do you really want to have multiple of them messing with each other? If yes how would they know what to touch and what not? The only thing I could think of was user space virtualization (like old UML) together with a real debugger, but frankly these solutions all seemed like big race conditions to me anyways and should be better done in the kernel or below it, so I have a hard time taking them seriously. Can you think of any scenario where multiple debuggers on a process make sense? -Andi
Re: linux-next: add utrace tree
On 01/26, Linus Torvalds wrote: The problem is that anything that is based on reparenting and signals is fundamentally a one parent only kind of interface. See? Indeed. signals + do_wait() is the horrible model. So the reason I think using an fd is a good idea is _not_ because gdb already uses an fd internally, but because it gives you a connection between the debugger and debuggee that is not fundamentally limited to a single controller. (It doesn't have to be a file descriptor, of course, but could be any kind of other model that allows multiple connections. Yes. But then we need something which represents this connection in kernel: utrace_engine. Then we need something which allows multiple tracers to cooperate. Just for example, one tracer wants to resume the tracee, another tracer wants the tracee to be stopped. Utrace does this. And, since we should preserve the current ptrace, the tracers should cooperate with ptrace too. IOW, this quickly leads to the new abstraction layer, I think. And of course it is possible to implement this new model on top of utrace. Yes, utrace itself comes with utrace_engine_ops vector to implement whatever you like, perhaps you dislike this part. Oleg.
Re: linux-next: add utrace tree
On 01/26, Andi Kleen wrote: But when I did that i couldn't come up with a good scenario where multiple debuggers actually make sense. In a sense being a debugger is really a very intimate thing for process. Do you really want to have multiple of them messing with each other? If yes how would they know what to touch and what not? Yes, multiple debuggers can confuse each other if they change the state of debuggee simultaneously. The user should do this ;) Can you think of any scenario where multiple debuggers on a process make sense? Simple example. Try to debug/strace strace ot gdb itself. Not trivial, you can't attach to strace's tracees. Recently I spent 2 days trying to understand why strace -f hangs. I was able to attach to strace, but I wasn't able to see what its tracees do. And, it was not possible to even trace strace until it hangs, with ptrace the tracee (strace) must stop to report the event and this shadowed the race. Oleg.
Re: linux-next: add utrace tree
Simple example. Try to debug/strace strace ot gdb itself. Not trivial, you can't attach to strace's tracees. Recently I spent 2 days trying to understand why strace -f hangs. I was able to attach to strace, but I wasn't able to see what its tracees do. But what would the semantics be inside the tracees even if you could? And, it was not possible to even trace strace until it hangs, with ptrace the tracee (strace) must stop to report the event and this shadowed the race. Shadowing the race was the second surname of strace I thought anyways @) Basically if you care about races never use strace in the first place. -Andi -- a...@linux.intel.com -- Speaking for myself only.
Re: linux-next: add utrace tree
Linus == Linus Torvalds torva...@linux-foundation.org writes: Tom * Support displaced stepping in the kernel; I think this would improve Tom performance when debugging in non-stop mode. Linus Don't we already do that at least on x86? I don't know. If it does, and gdb does not yet use that, then that would be worth changing. Linus Or maybe I'm not understanding what displaced stepping means to you. In non-stop mode (where you can stop one thread but leave the others running), gdb wants to have the breakpoints always inserted. So, something must emulate the displaced instruction. Tom
Re: linux-next: add utrace tree
Tom * Use an fd, not SIGCHLD+wait, to report inferior state changes to gdb. Tom Internally we're already using a self-pipe to integrate this into Tom gdb's main loop. Relatedly, don't mess with the inferior's parentage. Andi How would having a kernel based solution be better over your Andi user space simulation? Signals and wait are a pain because if we want to use some random library in gdb, there might be conflicts. This is true even if we use signalfd. An fd-for-debugging does not have this problem. This matters more now that we're letting people script gdb in python. Tom
Re: linux-next: add utrace tree
On 01/26, Andi Kleen wrote: Simple example. Try to debug/strace strace ot gdb itself. Not trivial, you can't attach to strace's tracees. Recently I spent 2 days trying to understand why strace -f hangs. I was able to attach to strace, but I wasn't able to see what its tracees do. But what would the semantics be inside the tracees even if you could? In this particular case, all I need was something like gdb -p to attach to the tracee, see the backtrace and detach. And, it was not possible to even trace strace until it hangs, with ptrace the tracee (strace) must stop to report the event and this shadowed the race. Shadowing the race was the second surname of strace I thought anyways @) Basically if you care about races never use strace in the first place. Yes. And utrace doesn't require the tracee to be stopped to report the event ;) Yes, yes, utrace can't fix strace in this sense automatically, but still. Oleg.
Re: linux-next: add utrace tree
On Tue, 26 Jan 2010, Tom Tromey wrote: In non-stop mode (where you can stop one thread but leave the others running), gdb wants to have the breakpoints always inserted. So, something must emulate the displaced instruction. I'm almost totally uninterested in breakpoints that actually re-write instructions. It's impossible to do that efficiently and well, especially in threaded environments. So if you do instruction rewriting, I can only say that's your problem. But using the hardware breakpoints should automatically DTRT, both wrt threads _and_ wrt restarting. Sure, there's onyl a limited number of them, so if somebody wants more than that they are kind of screwed, but that's just how life is. Linus
Re: linux-next: add utrace tree
tromey wrote: [...] In non-stop mode (where you can stop one thread but leave the others running), gdb wants to have the breakpoints always inserted. So, something must emulate the displaced instruction. This sounds like the sort of thing that kernel kprobes do, which the uprobes patch does for userspace. The gdbstub prototype can use uprobes for such displaced breakpoints, and single-step-out-of-line to execute them on a few platforms like x86-*. This is already prototyped / working. (gdbstub currently restricts itself to single-threaded programs only, but that's another todo.) - FChE
Re: linux-next: add utrace tree
On Sun, 24 Jan 2010, Kyle Moffett wrote: The point that's being missed is that there is a chicken-and-egg problem here. The chicken is a replacement or extension to the debugger interface that would make it possible for me to do things like GDB a process while it's being strace'd or vice versa. The egg is the utrace bits, an unstable but somewhat arch-generic ABI that abstracts out ptrace() to make it possible to stack both in-kernel and userspace debuggers/tracers/etc and have multiple simultaneous users. Quite frankly, as far as I'm concerned, I'd be a whole lot more interested in utrace if it's _only_ stated (and implied) goal was to do exactly this. The thing I object to is the whole dessert topping _and_ floor wax thing, with kernel interfaces for random other users. If somebody extended ptrace in good ways, that's a totally different thing. But I think utrace has been over-designed, possibly as a result of others coming in and saying hey, I'd like to use that too for xyz. Do one thing, and do it well. I'd not mind somebody improving ptrace (including extending its semantics - I do agree that the whole SIGSTOP thing makes it hard to have multiple debuggers). That said, I also suspect that people should still look seriously at simply just improving ptrace. For example, I suspect that the biggest problem with ptrace is really just the signalling, and that creating a new extension for JUST THAT, and then having a model where you can choose - at PTRACE_ATTACH time - how to wait for events would be a good thing. But as long as it is I want to solve all problems, I'm not very impressed. Maybe somebody would be interested in trying to take the utrace improvements, and scaling down what they promise, and ignoring all input except for I want to strace and gdb at the same time. So stop the crazy new kernel interfaces crap. Stop the crazy maybe we can use it for ftrace and generic user event tracing too. Stop the crazy. Linus
Re: linux-next: add utrace tree
Hi - On Mon, Jan 25, 2010 at 08:52:41AM -0800, Linus Torvalds wrote: [...] If somebody extended ptrace in good ways, that's a totally different thing. But I think utrace has been over-designed, possibly as a result of others coming in and saying hey, I'd like to use that too for xyz. [...] Earlier, you said that you haven't followed utrace at all. Upon what real information do you infer that it has been over-designed? - FChE
Re: linux-next: add utrace tree
On Mon, 25 Jan 2010, Frank Ch. Eigler wrote: Earlier, you said that you haven't followed utrace at all. Upon what real information do you infer that it has been over-designed? Upon the information that people are talking about magic new kernel interfaces to do fancy things. And talking about doing things with it that are simply not relevant for ptrace/strace. In fact, in this very thread I've been informed that there are no user interfaces to utrace at all, which to me says that it's been TOTALLY MISDESIGNED FROM THE VERY START, and has nothing to do with making ptrace work for strace/gdb at the same time. In other words, I may not have followed utrace development, but I sure as hell can read. And everything I read about it just makes me less inclined to want to merge it. The people who argue for it are actually screwing themselves by arguing for all the wrong things, and making me convinced I don't want to touch it with a ten-foot pole. If somebody were to argue that this is a simple series of patches to clean up ptrace and make it possible to strace a debugged process, then that would have been different. That's not what you or others have been doing. You've been pushing exactly the _reverse_ of that, namely how great it is for some random totally new features that I'm convinced aren't even used by a lot of people. So give me a populist argument that makes sense for tons of actual users, not some f*cking here's a cool infrastructure that developers can do random crazy out-of-tree crap with. Because I'm not interested in crazy developers. Linus
Re: linux-next: add utrace tree
On Mon, 25 Jan 2010, Linus Torvalds wrote: So give me a populist argument that makes sense for tons of actual users, not some f*cking here's a cool infrastructure that developers can do random crazy out-of-tree crap with. Because I'm not interested in crazy developers. In other words, give me the killer feature. The thing I've asked for all the time. The thing that you seem to continually NOT EVEN UNDERSTAND. Linus
Re: linux-next: add utrace tree
On Mon, 2010-01-25 at 09:36 -0800, Linus Torvalds wrote: Because I'm not interested in crazy developers. Linus Uh oh, that's not good for us real-time folks. http://lwn.net/Articles/357800/ And, according to Linus, the realtime people are crazy, so they can be left to deal with the weird stuff. -- Steve (Sorry, I just couldn't resist)
Re: linux-next: add utrace tree
Uh oh, that's not good for us real-time folks. http://lwn.net/Articles/357800/ And, according to Linus, the realtime people are crazy, so they can be left to deal with the weird stuff. I'd prefer the trees to be separate for testing purposes: it doens't make much sense to have SMP support as a normal kernel feature when most people won't have SMP anyway -- Linus Torvalds Use cases got that into the tree pretty easily, I am sure RT ones will do the same.
Re: linux-next: add utrace tree
On Mon, 25 Jan 2010, Steven Rostedt wrote: Uh oh, that's not good for us real-time folks. http://lwn.net/Articles/357800/ And, according to Linus, the realtime people are crazy, so they can be left to deal with the weird stuff. The RT people have actually been pretty good at slipping their stuff in, in small increments, and always with good reasons for why they aren't crazy. Yeah, it's taken them years, and they still have out-of-tree stuff. And yeah, they had to change some things to make them more palatable to the mainline kernel - the whole fundamental raw spinlock change is just the most recent example of that. But on the whole, I think it's actually worked out pretty well for them. I think the mainline kernel has improved in the process, but I also suspect that _their_ RT patches have also improved thanks to having to make the work more palatable to people like me who don't care all that deeply about their particular flavor of crazy. And yeah, I still think the hard-RT people are mostly crazy. So I can work with crazy people, that's not the problem. They just need to _sell_ their crazy stuff to me using non-crazy arguments, and in small and well-defined pieces. When I ask for killer features, I want them to lull me into a safe and cozy world where the stuff they are pushing is actually useful to mainline people _first_. In other words, every new crazy feature should be hidden in a nice solid Trojan Horse gift: something that looks _obviously_ good at first sight. The fact that it may contain the germs for future features should be hidden so well that not only is it not used as an argument (Hey, look at all those soldiers in that horse, imagine what you could do with them), it should also not be obvious from the source code (Look at all those hooks I sprinkled around, which aren't actually used by anything, but just imagine what you could do with them). Linus
Re: linux-next: add utrace tree
On Mon, 2010-01-25 at 10:12 -0800, Linus Torvalds wrote: But on the whole, I think it's actually worked out pretty well for them. I think the mainline kernel has improved in the process, but I also suspect that _their_ RT patches have also improved thanks to having to make the work more palatable to people like me who don't care all that deeply about their particular flavor of crazy. Actually this is an understatement. Every feature (and I do mean _every_) that went from -rt into mainline, undertook 3 or more rewrites before it was acceptable for mainline. And every time, the end result made the -rt patch set better as a whole. Not to mention, that a lot of the early stuff also cleaned up mainline. You can't have Real-Time without having a clean kernel. And as you stated, a lot of those patches to clean up the kernel, no one even knew that the real reason was to help the -rt patch set. They were well disguised Trojan horses. Darn, it looks like you are onto our scheme. -- Steve
Re: linux-next: add utrace tree
On Mon, 25 Jan 2010, Steven Rostedt wrote: On Mon, 2010-01-25 at 10:12 -0800, Linus Torvalds wrote: But on the whole, I think it's actually worked out pretty well for them. I think the mainline kernel has improved in the process, but I also suspect that _their_ RT patches have also improved thanks to having to make the work more palatable to people like me who don't care all that deeply about their particular flavor of crazy. Actually this is an understatement. Every feature (and I do mean _every_) that went from -rt into mainline, undertook 3 or more rewrites before it was acceptable for mainline. And every time, the end result made the -rt patch set better as a whole. Not to mention, that a lot of the early stuff also cleaned up mainline. You can't have Real-Time without having a clean kernel. And as you stated, a lot of those patches to clean up the kernel, no one even knew that the real reason was to help the -rt patch set. They were well disguised Trojan horses. Tsss. Never admit such things. Darn, it looks like you are onto our scheme. Which scheme ? The only Trojan horses in the kernel tree are in drivers/char/drivers/char/tty_io.c which put Linus himself into Linux-0.98.2 :) tglx
Re: linux-next: add utrace tree
* Thomas Gleixner t...@linutronix.de wrote: On Mon, 25 Jan 2010, Steven Rostedt wrote: On Mon, 2010-01-25 at 10:12 -0800, Linus Torvalds wrote: But on the whole, I think it's actually worked out pretty well for them. I think the mainline kernel has improved in the process, but I also suspect that _their_ RT patches have also improved thanks to having to make the work more palatable to people like me who don't care all that deeply about their particular flavor of crazy. Actually this is an understatement. Every feature (and I do mean _every_) that went from -rt into mainline, undertook 3 or more rewrites before it was acceptable for mainline. And every time, the end result made the -rt patch set better as a whole. Not to mention, that a lot of the early stuff also cleaned up mainline. You can't have Real-Time without having a clean kernel. And as you stated, a lot of those patches to clean up the kernel, no one even knew that the real reason was to help the -rt patch set. They were well disguised Trojan horses. Tsss. Never admit such things. Here's four examples of recent kernel features: - lockdep [1] - ftrace [2] - new-style generic mutexes and spin-mutexes [3] - the new arch/x86 tree[4] I suspect few would guess that all of these features were motivated by the -rt kernel originally: [1] lockdep started out as the 'track irqs-off sections' patches in -rt [2] ftrace started out as -rt's latency tracer and logdev [3] mutex.c was motivated by rtmutex.c [4] arch-x86 was motivated by annoyance with needless porting of -rt features from 32-bit to 64-bit x86 and back. [ Nor would you normally guess that Linux itself was motivated by a guy wanting to toy around with 32-bit x86 assembly ;-) ] Various forms of craziness that motivate us dont really hurt, as long as the process is rooted in reality. We can 'wish' for the crazier future stuff and can help it indirectly, and sometimes it might even happen down the road - but reality and common-sense utility is what controls. And note that there's nothing dishonest about doing multi-purpose patches, as long as the mainstream purpose isnt really just a decoy. When we decouple a feature from -rt we usually forget its -rt purpose and the intermediate for-mainstream forms arent even useful for -rt - back-integration into -rt comes at a later stage. This makes it doubly sure that it's all formed by mainstream's need, not -rt's needs. In the few cases where the -rt role is prominent for some weird reason we declare it as such. It's the exception to the rule really - few useful kernel features are single purpose. ( When they are then we are likely doing something wrong. -rt _is_ a special case. ) Ingo
Re: linux-next: add utrace tree
On Mon, 25 Jan 2010, Mark Wielaard wrote: And all these users have wishes to extend the current ptrace interface mess. But nobody dares to extend ptrace in any direction because fixing/cleaning up one of these use cases might break the others in subtle and not so subtle ways. Which is why the utrace series of patches is cleaning up all this stuff first. I call bullshit. You can clean up ptrace without introducing odd new interfaces and trying to sell it as some revolutionary new kernel interface that can do anything. I also call bullshit on the ptrace() is so horribly nasty argument. Yes, I've seen the code that uses ptrace in user space, and yes, it's nasty, but it's invariably _not_ nasty so much because ptrace itself is nasty, but because it's full of #ifdef so-and-so-os/so-and-so-arch, and the code is never cleaned up. There are a couple of obvious cases of ptrace being uglier-than-it-needs- to-be. Like the traditional ptrace read/write interface being purely word at a time, and that clearly is not pretty. Several architectures already do copy range kind of versions on it, though, so that's just a detail, and if anybody wanted to clean it up, they could have. The more fundamental problem is the use of signals (while at the same time wanting to _trap_ non-ptrace signals), without any model for a connection state, which is why you can have only one tracer. But again, that's largely a user interface issue, and apparently utrace does _nothing_ for that problem at all. So I do agree that ptrace is not a great interface. However: repeating that statement over and over in _no_ way excuses some totally unrelated code that doesn't have anything what-so-ever to do with the actual problems of ptrace. Linus
Re: linux-next: add utrace tree
Linus == Linus Torvalds torva...@linux-foundation.org writes: Linus No. There is absolutely _no_ reason to believe that gdb et al would ever Linus delete the ptrace interfaces anyway. Yes, in GDB we approximately never delete anything. Nevertheless, if the Linux kernel were to present a new user-space API, and if it had an advantage over ptrace, then we would port GDB to use it. There are other platforms where, IIRC, we now use some /proc thing instead of ptrace. There are definitely things we would like from such an API. Here's a few I can think of immediately, there are probably others. * Use an fd, not SIGCHLD+wait, to report inferior state changes to gdb. Internally we're already using a self-pipe to integrate this into gdb's main loop. Relatedly, don't mess with the inferior's parentage. * Support displaced stepping in the kernel; I think this would improve performance when debugging in non-stop mode. * Support some kind of breakpoint expression in the kernel; this would improve performance of conditional breakpoints. Perhaps the existing gdb agent expressions could be used. Tom
Re: linux-next: add utrace tree
On Mon, 25 Jan 2010, Tom Tromey wrote: There are definitely things we would like from such an API. Here's a few I can think of immediately, there are probably others. * Use an fd, not SIGCHLD+wait, to report inferior state changes to gdb. Internally we're already using a self-pipe to integrate this into gdb's main loop. Relatedly, don't mess with the inferior's parentage. As I kind of alluded to elsewhere, I heartily agree with this. The really major design mistake of ptrace (as opposed to just various ugly corners) is how it has no connection information, and that ends up being one of the main reasons why you can't have two ptracers working on the same thing. (There are other things that complicate that too, of course, like simply just trying to manage various per-thread state like debug registers etc, but that's a separate class of complications). * Support displaced stepping in the kernel; I think this would improve performance when debugging in non-stop mode. Don't we already do that at least on x86? Just doing a single-step should work on an instruction even if it has a breakpoint on it, because we set the TF bit. Or maybe I'm not understanding what displaced stepping means to you. * Support some kind of breakpoint expression in the kernel; this would improve performance of conditional breakpoints. Perhaps the existing gdb agent expressions could be used. I suspect it might be reasonable to do simple expressions on breakpoints, but not the kind of things gdb exports to users. IOW, maybe you could have a single conditional on a single value (register or memory) associated with an expression. Regardless, internally to the kernel your two later issues are details. The how to connect to the debuggee is a much more fundamental issue, and has the biggest design/interface impact. The other would likely just be new ptrace command extensions that somebody would have to just implement the grotty details on. Linus
Re: linux-next: add utrace tree
Let me add my two euro-cents to this discussion. Mark Wielaard m...@redhat.com: Unfortunately ptrace does all that magic already (badly). People don't just use it for (s)tracing syscalls, but also for tracing signals, for single step debugging and poking at memory, register state, for process jailing and virtualization (uml) through syscall emulation. So when they are talking about these fancy things that is because that is what ptrace gives them currently. And they hate it, because the ptrace interface is such a pain to work with. And all these things don't really work together. You cannot trace, emulate, debug, jail at the same time. I support Mark's words. I don't use ptrace for debugging/tracing and I have experienced severe limitations of ptrace interface. (I have tried to post some extensions for ptrace to overcome some constraints see my posts on ptrace_vm or ptrace_multi on LKML). Oleg Nesterov, writing to Andrew Morton said: First of all, utrace makes other things possible. gdbstub, nondestructive core dump, uprobes, kmview, hopefully more. I didn't look at these projects closely, perhaps other people can tell more. As for their merge status, until utrace itself is merged it is very hard to develop them out of tree. In the list above there is also kmview, which is a creature of mines. umview and kmview are partial virtual machines, processes running in a [uk]mview machine can have their own view for the file system, networking support, user-id, system-name, etc. A [uk]mview machine virtualizes just what the user need: the filesystem or just a subtree/some subtrees or networking or define one/some virtual devices, etc. The view provided by a [uk]mview machine can be a composition of real resources (provided by the Linux kernel) and virtual resources. Each system call request gets hijacked to a module of [uk]mview when it refers to a virtual resource. The request is forwarded to the kernel otherwise. umview is based on ptrace, kmview uses a kernel module based on utrace. (umview is included in debian lenny (to sid), tutorial and manuals in wiki.virtualsquare.org) IMHO utrace is better than ptrace (or an optimized version of it): 1 - Frank Ch. Eigler wrote: At least one reason is that ptrace is single-usage-only, so for example you cannot concurrently debug strace the same program. - exactly. utrace allows multiple tracing engines, this means that kmview machines can be nested (in a natural way, no extra code is needed for this feature). In the same way strace/gdb can run on virtualized processes, too. 2 - kmview kernel module implements several optimizations to minimize the number of requests forwarded to the kmview process (the virtual machine monitor). kmview is just a module using the utrace interface, prior attempts of optimized umview required kernel patches. Like kmview any other service requiring process tracing can include specific optimizations in its own kernel module. On the other hand, all these services could use the standardized utrace interface for their optimizations, instead asking for messy patches to change code all around the kernel source. 3 - ptrace takes SIGSTOP/SIGCONT for its own management. Strace/gdb and umview cannot be transparent for programs using these signals. Oleg Nesterov talking about Ptrace said: Of course they can't use other interfaces, we don't have them. And without the new abstraction layer we will never have, I think. I agree. THe following list includes the execution times I got in a recent test (make vde-2, see http://www.cs.unibo.it/~renzo/view-os-lk2009.pdf) plain kernel 22.7s, kmview (no modules) 23.9s (+5.5%), full kmview (modules loaded, all syscall virtualized) 38.5s (+70%) optimized umview 51.0 (+124%), umview on vanilla kernel 75.7s (+233%). utrace can be used to speedup virtualization (at least in my case it worked in this way). Performance can be useful for debugging but it is a main issue for virtualization. Kmview module provides optimizations to select the system call requests depending on the syscall number, the pathnames or the file descriptors. http://wiki.virtualsquare.org/index.php/KMview_module_interface_specifications Trying to add all the optimizations needed by different projects to ptrace is a never-ending nightmare: the LKML will continue to receive patch proposals for ptrace... The solution is that everybody can code his/her optimized kernel/user interface for tracing in his/her kernel module, i.e. utrace. renzo
Re: linux-next: add utrace tree
On Tue, 26 Jan 2010, Renzo Davoli wrote: The solution is that everybody can code his/her optimized kernel/user interface for tracing in his/her kernel module, i.e. utrace. I don't think people understand. That is simply not a solution. That is a PROBLEM. The thing you describe is an absolute disaster. Which is exactly why I rant against it. The last thing we want to have is here, take this, and make your own kernel module mess around it optimized for your particular crazy scenario. But every SINGLE post in this thread that has argued for utrace has argued exactly this way. Linus
Re: linux-next: add utrace tree
On Sat, Jan 23, 2010 at 09:04:56PM -0800, Linus Torvalds wrote: The killer app for this will be the ability to delete thousands of lines of code from GDB, strace, and all the various other tools that have to painfully work around the major interface gotchas of ptrace(), while at the same time making their handling of complex processes much more robust. No. There is absolutely _no_ reason to believe that gdb et al would ever delete the ptrace interfaces anyway. More to the point, gdb *couldn't* use utrace, because utrace only exports a kernel API; not a syscall interface. And if the Red Hat Toolchain folks are thinking about encouraging gdb to start creating out-of-tree kernel modules, so that (a) gdb requires root privs, and (b) gdb is as (un)stable as SystemTap with respect to development kernels by making it dependent on internal kernel API's, the Red Hat Toolchain group needs to be smacked upside the head... - Ted
Re: linux-next: add utrace tree
Hi - On Sun, Jan 24, 2010 at 05:25:13AM -0500, ty...@mit.edu wrote: [...] The killer app for this will be the ability to delete thousands of lines of code from GDB, strace, and all the various other tools that have to painfully work around the major interface gotchas of ptrace(), while at the same time making their handling of complex processes much more robust. No. There is absolutely _no_ reason to believe that gdb et al would ever delete the ptrace interfaces anyway. More to the point, gdb *couldn't* use utrace, because utrace only exports a kernel API; not a syscall interface. Yes, this might explain why Kyle wrote: [...] I believe that utrace is the kernel side of that API. [...] And if the Red Hat Toolchain folks are thinking about encouraging gdb to start creating out-of-tree kernel modules [...] the Red Hat Toolchain group needs to be smacked upside the head... Those keeping up will note that an ordinary in-tree, non-modular, non-root-only, already-works-with-standard-gdb, potentially-better-than-ptrace debugger interface has already been prototyped posted on lkml as an RFC. - FChE
Re: linux-next: add utrace tree
On Sat, 23 Jan 2010, Frank Ch. Eigler wrote: On Sat, Jan 23, 2010 at 07:04:01AM +0100, Ingo Molnar wrote: [...] Also, if any systemtap person is interested in helping us create a more generic filter engine out of the current ftrace filter engine (which is really a precursor of a safe, sandboxed in-kernel script engine), that would be excellent as well. [...] Thank you for the invitation. More could be done - a simple C-like set of function perhaps - some minimal per probe local variable state, etc. (perhaps even looping as well, with a limit on number of predicament executions per filter invocation.) Yes, at some point when such bytecode intepreter gets rich enough, one may not need the translated-to-C means of running scripts. ( _Such_ a facility, could then perhaps be used to allow applications access to safe syscall sandboxing techniques: i.e. a programmable seccomp concept in essence, controlled via ASCII space filter expressions [...] IMHO that would be a superior concept for security modules too [...] [...] specific functionality with an immediately visible upside, with no need for opaque hooks. This OTOH seem like rather a stretch. If one claims that opaque hooks are bad, so instead have hooks that jump not to auditable C code but an bytecode interpreter? And have the bytecodes be uploaded from userspace? How is this supposed to produce transparency from the kernel/hook point of view? Simply because the kernel controls which byte code is executed and has control over the functionality behind it. That makes the hooks well defined and transparent. Thanks, tglx
Re: linux-next: add utrace tree
Hi - tytso wrote: [...] Let me see if I can paraphrase those of your concerns that were substantive: 1) That if utrace is merged, and systemtap keeps on using it, there may be some sort of chilling effect on kernel developers that would impede utrace's future development. This might sound plausible to an outsider, but luckily we're not stuck with having to speculate: one can examine history. Systemtap has been around, working roughly the same way, for about *five years*. Systemtap modules use more than a handful of mainstream module-accessible kernel services. During all this time, how many examples have there been when when systemtap developers have pleaded with lkml to avoid changing some prior interface? How many of those successfully? (That last one is a trick question, since both numbers are really close to *zero*.) How much real impediment to change has our mere existence caused? 2) That systemtap is not portable to all kernel versions. Problems do periodically occur. However, one can again refer to historical facts to assess whether in fact they warrant long term grudges. In every release note, we list the range of kernel versions we test against. We may have one of the broadest ranges of support, 2.6.9 through to many current -rc*s and non-linus trees. We have several mechanisms which let us easily adapt to most changes. It may interest readers to find out that the number of systemtap changes we have had to add on account of kernel changes is on the order of a *few per year*. The usual turnaround, once reported, is on the order of a *few days*. 3) That systemtap users will complain to kernel developers if systemtap becomes incompatible. Let's go to the historical record again. How many such complaints have actually been seen in inappropriate fora such as lkml? How difficult were they to diagnose / redirect to the proper venue? Have they constituted a loss of face for kernel developers? 4) That systemtap is almost but not quite as evil as nvidia. It seems factors like ... - always being completely open source project - keeping in regular contact with lkml and other constituencies - not being related to essential hardware enablement, so users not wanting it don't have to touch it - the compile-to-C approach being technologically necessary since there was no alternative plausible way at the time (and still now) - repeatedly offering infrastructure code with non-stap uses ... all add up to a mere nudge away from entirely evil. If so, I wonder if your sort of grossly bimodal view of ethical virtue is going to foster the right sorts of change in the linux kernel community. - FChE
Re: linux-next: add utrace tree
On 01/24/10 13:01, Frank Ch. Eigler wrote: ... all add up to a mere nudge away from entirely evil. If so, I wonder if your sort of grossly bimodal view of ethical virtue is going to foster the right sorts of change in the linux kernel community. Nothing like a good religious debate to liven up your Sunday... - FChE
Re: linux-next: add utrace tree
On Sat, Jan 23, 2010 at 14:48, ty...@mit.edu wrote: The fundamental issue which Ingo is trying to say (and which you apparently don't seem to be understanding) is that utrace doesn't export a syscall (which is an ABI that we are willing to promise will be stable), but rather a set of kernel API's (which we never promise to be stable), The point that's being missed is that there is a chicken-and-egg problem here. The chicken is a replacement or extension to the debugger interface that would make it possible for me to do things like GDB a process while it's being strace'd or vice versa. The egg is the utrace bits, an unstable but somewhat arch-generic ABI that abstracts out ptrace() to make it possible to stack both in-kernel and userspace debuggers/tracers/etc and have multiple simultaneous users. and the fact that there will be out-of-tree programs that are going to be trying to depend on that interface (much like Systemtap does today when it creates kernel modules) is something that is considered on par with Nvidia trying to ship proprietary video drivers. Ugh... perhaps we should derive a variation of Godwin's law for this: As an LKML discussion grows longer, the probability of an unfavorable comparison involving nVidia or Microsoft approaches 1. If you want to try to slide utrace in, such that we're able to ignore the fact that there will be this external house that will be built on quicksand, pointing at how nice the external house will be isn't going to be helpful. Â Nor is pointing at the ability that other people will be able to build other really nice houses on the aforementioned quicksand (i.e., out-of-tree kernel modules that depend on kernel API's). Personally I don't give a flying about SystemTap; I'm interested in things like the ability to stack gdb with strace, the RFC gdb-stub posted a week ago, etc. None of those abilities would be out-of-tree modules at all, and therefore the quicksand analogy is specious. A simple code cleanup argument is not carrying the day (Look! Â We can cleanup the ptree code!). Â It's going to have to be a **really** cool in-tree kernel funtionality that provides a killer feature (in Linus's words), enough so that people are willing to overlook the fact that there's this monster external out-of-tree project that wants to be depend on API's that may not be stable, and which, even if the developers don't grump at us, users will grump at us when we change API's that we had never guaranteed will be stable, and then Systemtap breaks. I would be willing to guess that something like 95% of the people using SystemTap or other tools are doing so on Red Hat Enterprise Linux or other enterprise supported platforms, and so when something breaks they go whinge at Red Hat, etc. If I recall correctly Red Hat and many of the other vendors already heavily fiddle with kernel patches they apply to provide some amount of binary module compatibility. This is probably why Ingo invited you to think about ways of doing some kind of safe in-kernel bytecode approach. Â That has the advantage of doing away with external kernel modules, with all of their many downsides: its dependency on unstable kernel API's, the fact that many financial customers have security policies that prohibit C compilers on production machines, the inherent security risk of allowing external random kernel modules to be delivered and loaded into a system, etc. There are substantial non-SystemTap uses for utrace that would *not* be satisfied by an in-kernel bytecode approach, starting with stacking debuggers and tracers. Furthermore, let's say they did go off and build the in-kernel bytecode interpreter. I can pretty much guarantee that people would say the hooks into the rest of the kernel are too invasive and they should be abstracted out into an API. *This is that API!* Cheers, Kyle Moffett
Re: linux-next: add utrace tree
On Sat, Jan 23, 2010 at 12:23:33PM +0100, Ingo Molnar wrote: * Kyle Moffett k...@moffetthome.net wrote: On Fri, Jan 22, 2010 at 19:22, Linus Torvalds torva...@linux-foundation.org wrote: ... In that sense it might be better to fix/enhance ptrace, if there's interest. I've written a handful of ptrace extensions in the past (none of them went upstream tho), it can be done in a useful manner and the code is pretty hackable. There are basic problems left to be solved: for example why is there still no 'memory block copy' call, why are we _still_ limited to one word per system call PTRACE_PEEK* memory copies? It's ridiculous. SparcLinux has PTRACE_WRITE*/READ* support that implements this, but none of the other architectures have it so it's essentially unused. Or another possible direction would be to extend the perf events syscall with interception capabilities. It's far more performant at extracting application state without scheduling than any ptrace method - and interception/injection would be a natural next step - if there's interest. This certainly is now a chicken and egg problem. Everybody agrees that Linux needs something better than ptrace; legacy ptrace will continue to live, so will utilities written to it (strace, etc). But should that limit what Linux can offer? What's the way out? - Enhance ptrace: At least one ptrace maintainer (Roland) had publically stated he doesn't prefer enhancing legacy ptrace -- that its already a beast to maintain, and adding more complexity to it does it no good. - Extend perf; would perf then use utrace underneath? Or would one have to redo some of what utrace already does for thread level control? - Give utrace a syscall and make it the primary way for users to interact with the layer. There are benefits to this if there is agreement on the utrace layer itself, maybe with less fexibility than what it currently offers? If yes, what should it look like? Any new debug facility will have to incorporate some or most learnings from what utrace tried to address. It would be sad to just dump utrace and redo everything from scratch or band-aid existing interfaces. Ananth
Re: linux-next: add utrace tree
On Sun, Jan 24, 2010 at 08:42:13PM -0500, Kyle Moffett wrote: Personally I don't give a flying about SystemTap; I'm interested in things like the ability to stack gdb with strace, the RFC gdb-stub posted a week ago, etc. None of those abilities would be out-of-tree modules at all, and therefore the quicksand analogy is specious. Great. So what should be reviewed is utrace *plus* these other userland interfaces, which may get critiqued and improved, and utrace patches can be reviewed in light of these new features. But be warned if it turns out that only 30% of utrace is only needed to support gdb stacking with strace, etc., the other 70% will likely get ejected and the utrace patches streamlined to support these in-tree users. But since you don't give a flying about SystemTap, presumably you won't mind, right? I would be willing to guess that something like 95% of the people using SystemTap or other tools are doing so on Red Hat Enterprise Linux or other enterprise supported platforms, and so when something breaks they go whinge at Red Hat, etc. If I recall correctly Red Hat and many of the other vendors already heavily fiddle with kernel patches they apply to provide some amount of binary module compatibility. Sure, but as out-of-tree modules, the best they can expect is that most kernel developers will pretend that they don't exist. Which is OK, when I tried using SystemTap most of the concerns which I expressed as being critical for kernel developers were largely ignored (as near as I could tell) because the target market was RHEL corporate customers, and they prioritized their resourcing accordingly --- so they shouldn't mind if kernel developers return the favor. But that means that we should only merge those portions of utrace that are needed for these alleged killer new features, and only if these new features are cool enough that they justify the new code on their own merits. At least, IMNSHO. - Ted
Re: linux-next: add utrace tree
On Sat, Jan 23, 2010 at 2:22 AM, Linus Torvalds torva...@linux-foundation.org wrote: This is why when somebody brought up you could do a seccomp-like thing on top of utrace that my reaction was and is just totally negative. It shows all the wrong kinds of tying things together. seccomp-via-utrace should be just removed to be honest before its users. It entered the tree because it was very small and simple. If rewritten, it no longer is small and simple because of whole kernel/utrace.c.
Re: linux-next: add utrace tree
The killer app for this will be the ability to delete thousands of lines of code from GDB, strace, and all the various other tools that have to painfully work around the major interface gotchas of ptrace(), while at the same time making their handling of complex processes much more robust. Years ago (and it really must be years ago because this was about the time I started hacking on Linux stuff !) there was a proposal to extract and sanitize the arch specific stuff in binutils and in gdb etc into sensible libraries that could be used by other apps. What I don't understand is why that doesn't solve 99% of your problem. ptrace is not perfect but most of the real ptrace limitations actually come about because either the CPU can't do something or because the supporting logic would be too expensive - things like having extra private debugger pages. Yes ptrace needs a lot of icky support code, but it's already been written... Alan
Re: linux-next: add utrace tree
* Kyle Moffett k...@moffetthome.net wrote: On Fri, Jan 22, 2010 at 19:22, Linus Torvalds torva...@linux-foundation.org wrote: There are cases where we really _want_ to have common code. We want to have a common VFS interface because we want to show _one_ interface to user space across a gazillion different filesystems. We want to have a common driver layer (as far as possible) because - again - we expose a metric shitload of drivers, and we want to have one unified interface to them. So... Everybody agrees that ptrace() is horrible and a royal pain to use, let alone use correctly and without bugs. Everybody also agrees that ptrace() needs to stay around for a long time to avoid breaking all the existing users. Now how do we get from here to a moderately portable API for interrogating, controlling, and intercepting process state? Essentially it would need to support all of the things that a powerful debugger would want to do, including modifying registers and memory, substituting syscall return values, etc. I believe that utrace is the kernel side of that API. The problem is, utrace does not do that really. What utrace does is that it provides an opaque set of APIs for unspecified and out of tree _kernel_ modules (such as systemtap). It doesnt support any 'application' per se. It basically removes the kernel's freedom at shaping its own interaction with debug application. If utrace was a 'better ptrace' syscall, where the syscall itself is the goal of the hookery, it would all be rather different. People could argue about _that_ interface (and the hooks would be a pure kernel internal implementational detail - not an interface specification), and once people agree about that ABI and there's enough application momentum behind it, the hooks are really not that opaque anymore - they are for that ABI and not more. Note that it's still a _big_ hurdle: it's hard to agree on a new syscall and it's hard to get 'application momentum' behind it. Special Linux system calls have a checkered past, they tend to not be used by much anything, and thus they tend to be a breeding ground of both bugs, maintenance complexity and security problems. Lack of attention is never good. In that sense it might be better to fix/enhance ptrace, if there's interest. I've written a handful of ptrace extensions in the past (none of them went upstream tho), it can be done in a useful manner and the code is pretty hackable. There are basic problems left to be solved: for example why is there still no 'memory block copy' call, why are we _still_ limited to one word per system call PTRACE_PEEK* memory copies? It's ridiculous. SparcLinux has PTRACE_WRITE*/READ* support that implements this, but none of the other architectures have it so it's essentially unused. Or another possible direction would be to extend the perf events syscall with interception capabilities. It's far more performant at extracting application state without scheduling than any ptrace method - and interception/injection would be a natural next step - if there's interest. Thanks, Ingo
Re: linux-next: add utrace tree
Hi - mingo wrote: [...] Now how do we get from here to a moderately portable API for interrogating, controlling, and intercepting process state? Essentially it would need to support all of the things that a powerful debugger would want to do, including modifying registers and memory, substituting syscall return values, etc. I believe that utrace is the kernel side of that API. The problem is, utrace does not do that really. In fact, it is exactly designed for that. What utrace does is that it provides an opaque set of APIs for unspecified and out of tree _kernel_ modules (such as systemtap). It doesnt support any 'application' per se. It basically removes the kernel's freedom at shaping its own interaction with debug application. This claim is hard to take any more seriously than emoting that the blockio layer is opaque because device drivers remove freedom for the kernel to shape its interaction with hardware. If you have any *real evidence* about how any present user of utrace misuses that capability, or interferes with the kernel's freedom, show us please. - FChE
Re: linux-next: add utrace tree
Hi - On Sat, Jan 23, 2010 at 11:01:21AM +, Alan Cox wrote: [...] What I don't understand is why [libgdb?] doesn't solve 99% of your problem. ptrace is not perfect but most of the real ptrace limitations actually come about because either the CPU can't do something or because the supporting logic would be too expensive - things like having extra private debugger pages. At least one reason is that ptrace is single-usage-only, so for example you cannot concurrently debug strace the same program. OTOH, utrace is designed to permit clean nesting/sharing semantics for concurrent debugger-type tools operating on the same processes. - FChE
Re: linux-next: add utrace tree
Em Sat, Jan 23, 2010 at 11:01:21AM +, Alan Cox escreveu: Years ago (and it really must be years ago because this was about the time I started hacking on Linux stuff !) there was a proposal to extract and sanitize the arch specific stuff in binutils and in gdb etc into sensible libraries that could be used by other apps. Aleluiah if it had happened at that time, but sadly... :-( - Arnaldo
Re: linux-next: add utrace tree
On Sat, Jan 23, 2010 at 06:47:29AM -0500, Frank Ch. Eigler wrote: What utrace does is that it provides an opaque set of APIs for unspecified and out of tree _kernel_ modules (such as systemtap). It doesnt support any 'application' per se. It basically removes the kernel's freedom at shaping its own interaction with debug application. This claim is hard to take any more seriously than emoting that the blockio layer is opaque because device drivers remove freedom for the kernel to shape its interaction with hardware. If you have any *real evidence* about how any present user of utrace misuses that capability, or interferes with the kernel's freedom, show us please. The fundamental issue which Ingo is trying to say (and which you apparently don't seem to be understanding) is that utrace doesn't export a syscall (which is an ABI that we are willing to promise will be stable), but rather a set of kernel API's (which we never promise to be stable), and the fact that there will be out-of-tree programs that are going to be trying to depend on that interface (much like Systemtap does today when it creates kernel modules) is something that is considered on par with Nvidia trying to ship proprietary video drivers. (OK, maybe not *quite* as evil as Nvidia because at least SystemTap is open source, but the bottom line is that enabling out-of-tree modules isn't considered a good thing, and if we know in advance that there are out-of-tree modules, there is a strong tendency to want to nip those in the bud.) The reason why I avoid Nvidia hardware like the plague is because I work on bleeding-edge kernels, and even though companies like Nvidia and Broadcom try very hard to keep up with released upstream kernels, #1, there is always the concern of what happens if they decide to change that policy, and #2, invariably something will break during the -rc1 or -rc2 stage, and then my laptop is useless for running bleeding edge kernels. It's one of the reasons why many kernel developers gave up on SystemTap, because it's not something that can be trusted to be there, and the fault is not on our changing the API's, it's on SystemTap depending on API's that were never guaranteed to be stable in the first place. If you want to try to slide utrace in, such that we're able to ignore the fact that there will be this external house that will be built on quicksand, pointing at how nice the external house will be isn't going to be helpful. Nor is pointing at the ability that other people will be able to build other really nice houses on the aforementioned quicksand (i.e., out-of-tree kernel modules that depend on kernel API's). A simple code cleanup argument is not carrying the day (Look! We can cleanup the ptree code!). It's going to have to be a **really** cool in-tree kernel funtionality that provides a killer feature (in Linus's words), enough so that people are willing to overlook the fact that there's this monster external out-of-tree project that wants to be depend on API's that may not be stable, and which, even if the developers don't grump at us, users will grump at us when we change API's that we had never guaranteed will be stable, and then Systemtap breaks. This is probably why Ingo invited you to think about ways of doing some kind of safe in-kernel bytecode approach. That has the advantage of doing away with external kernel modules, with all of their many downsides: its dependency on unstable kernel API's, the fact that many financial customers have security policies that prohibit C compilers on production machines, the inherent security risk of allowing external random kernel modules to be delivered and loaded into a system, etc. - Ted
Re: linux-next: add utrace tree
On Sat, 23 Jan 2010, Kyle Moffett wrote: Now how do we get from here to a moderately portable API for interrogating, controlling, and intercepting process state? Umm? ptrace? It's not _pretty_, but it's a hell of a lot more portable than utrace is ever going to be. Yes, the details differ between OS's (and between architectures), but let's face it, things like register state probing is _never_ going to be portable across different architectures simply because the register state isn't the same. The killer app for this will be the ability to delete thousands of lines of code from GDB, strace, and all the various other tools that have to painfully work around the major interface gotchas of ptrace(), while at the same time making their handling of complex processes much more robust. No. There is absolutely _no_ reason to believe that gdb et al would ever delete the ptrace interfaces anyway. That really is my point. Adding a new interface, when an old and crufty (but working) interface is inevitably going to be around anyway - and is inevitably always going to have portability issues - is STUPID. Let's take strace, for example. Yes, ptrace() is crufty, but have you actually looked at strace source code? The problem isn't really a crufty interface to read registers etc, the bigger problem for strace is that different architectures and OS's have different system call argument rules, different ways to read/write system call numbers yadda yadda yadda. Take a look at strace sources some day. Moving away from ptrace on Linux (even if you decided that you don't care about old versions of the kernel that don't know anything else) would simplify ABSOLUTELY NOTHING. Really. Quiet the reverse, I suspect. The Solaris and FreeBSD support uses ptrace too, afaik, so you' just be confusing the issue. And the fact is, strace would still end up supporting ptrace anyway, just so that you could run it on old kernels. So the whole making a new utrace interface would simpligy things is simply a total lie. The fact that ptrace is a bit of an odd interface IN NO WAY means that any other interface would end up being appreciably simpler. It would just result in _more_ code in strace, and more confusion. Linus
Re: linux-next: add utrace tree
On 01/21, Linus Torvalds wrote: On Thu, 21 Jan 2010, Andrew Morton wrote: ptrace is a nasty, complex part of the kernel which has a long history of problems, but it's all been pretty quiet in there for the the past few years. More importantly, we're not ever going to get rid of it. Unfortunately, you are right. The current ptrace (as it is visible from user-space) should stay forever. Quite frankly, judging my all past history we have ever seen in kernel interfaces, new an non-portable interfaces simply are never used. The whole question whether they are nicer or not is entirely immaterial. I have to admit this point looks very reasonable to me. Except, can't resist, ptrace itself is hardly portable. I'm personally very dubious that there are any merits to utrace that outweigh the very clear disadvantages: just another layer that adds a new level of abstraction to the only interface that people actually _use_, namely ptrace. Of course they can't use other interfaces, we don't have them. And without the new abstraction layer we will never have, I think. Oleg.
Re: linux-next: add utrace tree
Hi - oleg wrote: [...] I'm personally very dubious that there are any merits to utrace that outweigh the very clear disadvantages: just another layer that adds a new level of abstraction to the only interface that people actually _use_, namely ptrace. Of course they can't use other interfaces, we don't have them. And without the new abstraction layer we will never have, I think. This is one of the reasons we built, up on request of lkml people, the utrace-gdbstub prototype (http://lkml.org/lkml/2009/11/30/173). It presents a standard userspace debugging interface -- actually, more standard than ptrace! It has the potential to be more powerful feature-wise and perhaps even perform faster than ptrace. And yet that RFC didn't receive any on-topic review, only wishes for unspecified blue-sky integration with kernel debugging. So then there's uprobes, which is another potential utrace killer app, if it weren't so tainted by some peoples' disdain for its current user, when other users are already being seriously discussed. So a working prototype, which demonstrates both the utility of utrace itself and the end-user value of user-space probing, is disregarded. And there are several smaller utrace clients in the works, each of them merge candidates in the future. Yes, most of them may be rewritten with special-purpose hook after hook as people reinvent the utrace wheel piece by piece, but how long will that take? How is the opportunity cost of missing features valued? Finally, I don't know how to address the logic of if a feature requires utrace, that's a bad argument for utrace and at the same time you need to show a killer app for utrace. What could possibly satisfy both of those constraints? Please advise. - FChE
Re: linux-next: add utrace tree
On Fri, 2010-01-22 at 15:01 -0500, Frank Ch. Eigler wrote: So then there's uprobes, which is another potential utrace killer app That's bollocks, uprobes is an utter and total mis-match for utrace. Probing userspace is primarily about DSOs which is files and vma's, not tasks. You might maybe want a utrace interface to that, but that is largely non-interesting. IOW, we don't need utrace to make sensible use of uprobes. (And when I speak of uprobes I mean the thing formerly called UBP)
Re: linux-next: add utrace tree
On 01/21, Linus Torvalds wrote: I realize that my argument is very anti-thetical to the normal CS teaching of general-purpose is good. I often feel that very specific code with very clearly defined (and limited) applicability is a good thing - I'd rather have just a very specific ptrace layer that does nothing but ptrace, than a generic plugin layer that can be layered under ptrace and other things. I am repeating the same (and probably poor) arguments, but we don't have a clearly defined ptrace layer. The current code is just the set of precedents, I mean, this code does this because we always did this for unknown reason. And we can't fix it without breaking things. Even the obvious bugs which could be fixed by the very simple patch should be preserved sometimes. In fact, afaics the current state is: if it can't crash the kernel - it is not the bug. Otoh, ptrace is very limited, yes. Imho - too limited. And, as a user-space api, it is just horrible. However: we're not ever going to get rid of it. Yes, sure. But I am afraid this all is almost off-topic. Afaik, utrace was not created to solve the problems with ptrace, at least I am sure this wasn't the only goal. Unfortunately, I didn't participate in other projects which use utrace. Even if I did, I don't know how could I prove they are important enough to have a generic layer to make other things possible. Oleg.
Re: linux-next: add utrace tree
Hi - On Fri, Jan 22, 2010 at 09:16:16PM +0100, Peter Zijlstra wrote: [...] So then there's uprobes, which is another potential utrace killer app That's bollocks, uprobes is an utter and total mis-match for utrace. Probing userspace is primarily about DSOs which is files and vma's, not tasks. [...] Your experience with user-space probing apparently differs from ours. In fact there exists plenty of interest and utility in probing given processes only, if for no other reason then to avoid disrupting others running on the machine. Nearly always, it is better to build a multiprocess probing widget from multiply-applied single-process ones, rather than to build single-process probing from grossly-filtered systemwide/VMA ones. (If the lower level infrastructure provides both options, groovy.) - FChE
Re: linux-next: add utrace tree
Hi - On Fri, Jan 22, 2010 at 01:59:11PM -0800, Linus Torvalds wrote: [...] Finally, I don't know how to address the logic of if a feature requires utrace, that's a bad argument for utrace and at the same time you need to show a killer app for utrace. What could possibly satisfy both of those constraints? Please advise. The point is, the feature needs to be a killer feature. And I have yet to hear _any_ such killer feature, especially from a kernel maintenance standpoint. The better ptrace than ptrace is irrelevant. Sure, we all know ptrace isn't a wonderful feature. But it's there, and a debugger is going to have support for it anyway, so what's the _advantage_ of a better ptrace interface? There is absolutely _zero_ advantage, there's just yet another interface. We can't get rid of the old one _anyway_. The point is that the intermediate api will allow (and, as the part you clipped out about utrace-gdbstub said, *already has allowed*) alternative plausible interfaces that coexist just fine. And the seccomp replacement just sounds horrible. Using some tracing interface to implement security models sounds like the worst idea ever. So all this is about *naming* utrace? It was never built for tracing, but for (efficient/multiplexed) *control*. That wasn't even its original name -- one of your lieutenants asked roland to change it to utrace. And like it or not, over the last almost-decade, _not_ having to have to work with system tap has been a feature, not a problem, for the kernel community. I don't have a problem with that. We have apprx. never imposed anything on developers who didn't want to use it. There are plenty who have and will. - FChE
Re: linux-next: add utrace tree
On Fri, 22 Jan 2010, Frank Ch. Eigler wrote: The point is that the intermediate api will allow (and, as the part you clipped out about utrace-gdbstub said, *already has allowed*) alternative plausible interfaces that coexist just fine. And my point is that multiple interfaces are BAD. There is one interface we _have_ to have: the traditional ptrace one. That one we can't get away from. Multiple interfaces on its own is just confusion with no upside. You need a _reason_ to have other interfaces. They need to have that killer feature. Just being different is not a feature at all. So all this is about *naming* utrace? It was never built for tracing, but for (efficient/multiplexed) *control*. That wasn't even its original name -- one of your lieutenants asked roland to change it to utrace. No. It's not about naming. It's about the downside of having amorphous interfaces that apparently don't even have rules, and are then used to implement random crap. Yes, the SNL skit about It's a dessert topping _and_ a floor wax was funny, but it was funny exactly because it was crazy. The fact that you can do crazy things is not a good thing. You need to find the goodness somewhere else, and that's what I'm trying to tell you. You just seem to have trouble listening. Linus
Re: linux-next: add utrace tree
On Fri, 22 Jan 2010, Linus Torvalds wrote: No. It's not about naming. It's about the downside of having amorphous interfaces that apparently don't even have rules, and are then used to implement random crap. Yes, the SNL skit about It's a dessert topping _and_ a floor wax was funny, but it was funny exactly because it was crazy. Put yet another way: I'd _much_ rather have two totally separate pieces that don't depend on each other, and do different things. So to take a very practical example: I'd much rather have 'seccomp' and 'ptrace' that have _nothing_ what-so-ever to do with each other, than have some intermediate layer that then needs to make both of those happy, and that both have to interact with. There are cases where we really _want_ to have common code. We want to have a common VFS interface because we want to show _one_ interface to user space across a gazillion different filesystems. We want to have a common driver layer (as far as possible) because - again - we expose a metric shitload of drivers, and we want to have one unified interface to them. But going the other way: trying to share code when the interfaces are fundamentally _different_ is generally not at all such a great idea. It ends up tying two conceptually totally separate things together, and suddenly people who work on feature X aneed to modify infrastructure that affects feature Y, and it turns ou that details A, B and C are all totally different for the two features and the middle layer has two conflicting things it needs to work with. This is why when somebody brought up you could do a seccomp-like thing on top of utrace that my reaction was and is just totally negative. It shows all the wrong kinds of tying things together. Linus
Re: linux-next: add utrace tree
On Fri, Jan 22, 2010 at 19:22, Linus Torvalds torva...@linux-foundation.org wrote: There are cases where we really _want_ to have common code. We want to have a common VFS interface because we want to show _one_ interface to user space across a gazillion different filesystems. We want to have a common driver layer (as far as possible) because - again - we expose a metric shitload of drivers, and we want to have one unified interface to them. So... Everybody agrees that ptrace() is horrible and a royal pain to use, let alone use correctly and without bugs. Everybody also agrees that ptrace() needs to stay around for a long time to avoid breaking all the existing users. Now how do we get from here to a moderately portable API for interrogating, controlling, and intercepting process state? Essentially it would need to support all of the things that a powerful debugger would want to do, including modifying registers and memory, substituting syscall return values, etc. I believe that utrace is the kernel side of that API. The killer app for this will be the ability to delete thousands of lines of code from GDB, strace, and all the various other tools that have to painfully work around the major interface gotchas of ptrace(), while at the same time making their handling of complex processes much more robust. The *second* killer app for this is to make it much easier for people to write new userspace debugging tools. I love the various crash-catching tools that different distributions or applications provide, but they all basically have to trap the SIGSEGV and hope they're still sensible enough to fork() and exec() a gdb process. Furthermore, I would love to be able to write debugging tools for scripting languages that allow me to step across Perl, C, PHP, assembly code, etc, all within the same process. In theory that's all possible today, but given how much of a *pain* ptrace() is to use correctly, nobody bothers. Now, with all that said, utrace does not provide any of the userspace side APIs today... but I think it is a necessary refactoring if we want to provide a new ideal process-introspection interface without breaking all the ptrace() users. Think of the utrace interface as very much like the LSM interface. Just like with LSMs, there is a lot of active research in debugging and tracing tools, and nobody can even remotely agree what the hell they want out of the hooks. In theory you could add one hook for every place each security module needs one... but then your fast-path is littered with always-false test-and-jump statements. What utrace provides is the one single test in each fast path that then searches for and executes the appropriate slow path(s) for that process. I personally would be very happy to see utrace merged. Cheers, Kyle Moffett
Re: linux-next: add utrace tree
Hi - On Thu, Jan 21, 2010 at 04:31:45PM -0800, Andrew Morton wrote: [...] Someone please sell this to us. Here's what Oleg said last time I asked this: [...] I wonder if Roland/Oleg are being too modest in their current role as ptrace maintainers. Considering that *they* think of utrace as a means toward proper refactoring of ptrace, how much further burden of proof should they shoulder? To what extent are other subsystem maintainers required to sell reworkings of their areas, when there appear to be no drawbacks and at least arguable benefits? - FChE
Re: linux-next: add utrace tree
Hi - On Thu, Jan 21, 2010 at 05:05:41PM -0800, Andrew Morton wrote: [...] ptrace is a nasty, complex part of the kernel which has a long history of problems, but it's all been pretty quiet in there for the the past few years. This leads one to expect that a rip-out-n-rewrite is a high-risk prospect. So, quite reasonably, one looks for a good reason for taking such risk. [...] To the extent the discussion is colored by risk avoidance, then the answer to that would consist of code reviews, and of course a look at the actual historical reliability of this code. While some might enjoy reminding us about the brief kerneloops incident in 2008, let's keep in mind that versions of this code has been deployed in fedora and rhel for several *years*, with millions of users. It's not some rickety experiment. To the extent the discussion is colored by the new features enabled from this refactoring, well, there is Oleg's list which may or may not have mentioned enabling systemtap's user-space probing. More details can be furnished on demand. Several of the use examples were constructed in good faith upon request from the kernel community asking for more and more. But what's enough? Who knows, really? - FChE
Re: linux-next: add utrace tree
On Thu, 21 Jan 2010, Andrew Morton wrote: ptrace is a nasty, complex part of the kernel which has a long history of problems, but it's all been pretty quiet in there for the the past few years. More importantly, we're not ever going to get rid of it. Quite frankly, judging my all past history we have ever seen in kernel interfaces, new an non-portable interfaces simply are never used. The whole question whether they are nicer or not is entirely immaterial. I'm personally very dubious that there are any merits to utrace that outweigh the very clear disadvantages: just another layer that adds a new level of abstraction to the only interface that people actually _use_, namely ptrace. But I haven't followed utrace. I doubt _anybody_ has, except for the utrace people themselves. Linus
Re: linux-next: add utrace tree
On Thu, 21 Jan 2010, Frank Ch. Eigler wrote: To the extent the discussion is colored by the new features enabled from this refactoring, well, there is Oleg's list which may or may not have mentioned enabling systemtap's user-space probing. Let's face it, system tap isn't going to be merged, so why even bring it up? Every kernel developer I have _ever_ seen agrees that all the new tracing is a million times superior. I'm sure there are system tap people who disagree, but quite frankly, I don't see it being merged considering how little the system tap people ever did for the kernel. So if things like system tap and security models that go behind the kernel by tying into utrace are the reasons for utrace, color me utterly uninterested. In fact, color me actively hostile. I think that's the worst possible situation that we'd ever be in as kernel people (namely exactly the do things in kernel space by hiding behind utrace without having kernel people involved) Linus
Re: linux-next: add utrace tree
Hi - On Thu, Jan 21, 2010 at 05:32:47PM -0800, Linus Torvalds wrote: [...] To the extent the discussion is colored by the new features enabled from this refactoring, well, there is Oleg's list which may or may not have mentioned enabling systemtap's user-space probing. Let's face it, system tap isn't going to be merged, so why even bring it up? It was certainly not meant to derail the discussion about the merits of utrace as a useful cleanup API in its own right, but rather to be an example of what kinds of things become straightforward in its presence. You may be aware of nascent efforts to bring the same uprobes infrastructure to perf. Every kernel developer I have _ever_ seen agrees that all the new tracing is a million times superior. [...] And that is fine. We believe there is plenty of space in the problem domain for different approaches. ... considering how little the system tap people ever did for the kernel. Less passionate analysis would identify a long history of contribution by the the greater affiliated team, including via merged code and by and passing on requirements and experiences. We have been trying to share as much as you have been willing to take. While systemtap's current codebase may not (and need not) have a future inside the kernel, chances are good that improvements in common infrastructure will allow systemtap to shrink and change enough that the question becomes moot. - FChE
Re: linux-next: add utrace tree
On Thu, 21 Jan 2010, Frank Ch. Eigler wrote: Less passionate analysis would identify a long history of contribution by the the greater affiliated team, including via merged code and by and passing on requirements and experiences. The reason I'm so passionate is that I dislike the turn the discussion was taking, as if utrace was somehow _good_ because it allowed various other interfaces to hide behind it. And I'm not at all convinced that is true. And I really didn't want to single out system tap, I very much feel the same way abotu some seccomp-replacement security model that the kernel doesn't even need to know about thing. So don't take the systemtap part to be the important part, it's the bigger issue of I'd much rather have explicit interfaces than have generic hooks that people can then use in any random way. I realize that my argument is very anti-thetical to the normal CS teaching of general-purpose is good. I often feel that very specific code with very clearly defined (and limited) applicability is a good thing - I'd rather have just a very specific ptrace layer that does nothing but ptrace, than a generic plugin layer that can be layered under ptrace and other things. In one case, you know exactly what the users are, and what the semantics are going to be. In the other, you don't. So I really want to see a very big and immediate upside from utrace. Because to me, the it's a generic layer with any application you want to throw at it is a _downside_. Linus
Re: linux-next: add utrace tree
On Thu, Jan 21, 2010 at 05:28:42PM -0800, Linus Torvalds wrote: On Thu, 21 Jan 2010, Andrew Morton wrote: ptrace is a nasty, complex part of the kernel which has a long history of problems, but it's all been pretty quiet in there for the the past few years. More importantly, we're not ever going to get rid of it. FWIW, Oleg's implementation of ptrace over utrace is 100% compatible with legacy ptrace; gdb testsuite indicates that (http://lkml.org/lkml/2009/12/21/98). Ananth
Fw: Re: linux-next: add utrace tree
Hi Roland, Oleg, Would it be a good idea to probably start looking at user space api for utrace? By doing that we would get usecases that maintainers in LKML are looking for and start looking at its usefulness. Currently its probably a egg and chicken case where they look at what end customers are getting that additional benefit from utrace and we are looking at providing the user interface after the bits go in. -- Thanks and Regards Srikar ---BeginMessage--- On Fri, 22 Jan 2010 11:17:47 +1100 Stephen Rothwell s...@canb.auug.org.au wrote: Any thoughts? I'm nearly a week behind again and am trying to avoid thinking. I've had a (n old) version of utrace in -mm for ages and it didn't break anything. I still don't think I've seen a really compelling reason for merging it. At least, I wouldn't be able to explain why we did it. But presumably there _are_ such reasons, because it was a lot of development work. Someone please sell this to us. ---End Message---
Re: linux-next: add utrace tree
On Wed, Jan 20, 2010 at 12:10:26PM +0530, Ananth N Mavinakayanahalli wrote: It will cause conflicts with various other trees and increases the overhead all around. It also causes us to trust linux-next bugreports less - as it's not the 'next Linux' anymore. Also, there's virtually no high-level technical review done in linux-next: the trees are implicitly trusted (because they are pushed by maintainers), bugs and conflicts are reported but otherwise it's a neutral tree that includes pretty much any commit indiscriminately. If you need review and testing there's a number of trees you can get inclusion into. So would -tip be one of them? If so could you pull the utrace-ptrace branch in? Or did you intend some other tree (random-tracing)? (Though I think a ptrace reimplementation isn't 'random'-tracing :-)) Heh. No this is a tree I use for, well, random tracing patches indeed, which has extended to random tracing/perf/* patches by the time. I sometimes relay other's patches to Ingo toward this tree but this is usually about small volumes and for small term storage: patches that have been reviewed/acked already. utrace/uprobe is about high volume and longer time debate/review/maintainance and I won't have the time to carry this. Ananth
Re: linux-next: add utrace tree
Hi - On Wed, Jan 20, 2010 at 05:59:59PM +1100, Stephen Rothwell wrote: [...] Including experimental code that is RFC and which is not certain to go upstream is certainly not the purpose of linux-next though. Ingo is correct in what he says here. See the boilerplate: [...] Basically, this should be just what you would send to Linus (or ask him to fetch). I will remove this tree from linux-next tomorrow and wait until it is more ready for mainline inclusion. Please reconsider. Ingo mistook what was being proposed. We request merge/integration testing for just the set of patches posted http://lkml.org/lkml/2009/12/17/466, which was in response to peterz's earlier review comments, and none of which is labeled or considered RFC or experimental. Ananth was right that the utrace-ptrace git branch represents this rather than master. - FChE pgpQGsDelG5SS.pgp Description: PGP signature
Re: linux-next: add utrace tree
Frank, please be clear as to which branch you want included (master or utrace-ptrace). Also note that neither of those branches matches what was posted in the sense that they both have lots of history and merges not represented in the patches. (I assume that they do produce the same final source tree, though). Yes, the trees do match. I certainly never expected our ancient git history to get merged in directly upstream. I've made a new branch on: git://git.kernel.org/pub/scm/linux/kernel/git/frob/linux-2.6-utrace.git called: next/master (Actually it's on master.kernel.org and the public mirror is being a little slow as I write this.) This starts from v2.6.33-rc4 and then has commits for the 7 patches that Oleg posted in December. Beyond that, we've added one follow-on patch to fix a bug Oleg just tracked down (Oleg will post that patch soon). And I've added one more commit with a MAINTAINERS update, shown below. You can also find the same stuff from the series file and patch files in: http://people.redhat.com/utrace/2.6-next/ If it makes things easier for linux-next to have this git branch either rebased or merged from a different fork point, please let me know. Thanks, Roland --- [PATCH] MAINTAINERS: add utrace This updates the ptrace entry to cover utrace too. They are part of the same maintenance effort. Also add the utrace mailing list. Signed-off-by: Roland McGrath rol...@redhat.com --- MAINTAINERS |7 +-- 1 files changed, 5 insertions(+), 2 deletions(-) diff --git a/MAINTAINERS b/MAINTAINERS index c8f47bf..8da2a0a 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -4375,15 +4375,18 @@ M: Jim Paris j...@jtan.com L: cbe-oss-...@ozlabs.org S: Maintained -PTRACE SUPPORT +PTRACE AND UTRACE SUPPORT M: Roland McGrath rol...@redhat.com M: Oleg Nesterov o...@redhat.com +L: utrace-devel@redhat.com S: Maintained F: include/asm-generic/syscall.h F: include/linux/ptrace.h F: include/linux/regset.h F: include/linux/tracehook.h -F: kernel/ptrace.c +F: include/linux/utrace.h +F: kernel/ptrace* +F: kernel/utrace* PVRUSB2 VIDEO4LINUX DRIVER M: Mike Isely is...@pobox.com
Re: linux-next: add utrace tree
Hi Frank, On Tue, 19 Jan 2010 16:16:46 -0500 Frank Ch. Eigler f...@redhat.com wrote: Having been reviewed a couple of times, and we hope being a good candidate for merging next time, please start pulling git://git.kernel.org/pub/scm/linux/kernel/git/frob/linux-2.6-utrace.git branch master I have added this from today with you and utrace-devel as the contacts. I have cc'd the wider community on this email so that people are aware that this has been included. This repo contains frequent merges from Linus' tree. If you'd prefer a cleaner rebase-based branch to pull from, we can make one of those too. For now it is OK, but you might like to ask Linus if he would like it cleaned up before submission since it seems to have history right back to 2.6.29 and (as you say) lots of merges with his tree. You should also add a commit with an entry in MAINTAINERS. [Standard boilerplate] Thanks for adding your subsystem tree as a participant of linux-next. As you may know, this is not a judgment of your code. The purpose of linux-next is for integration testing and to lower the impact of conflicts between subsystems in the next merge window. You will need to ensure that the patches/commits in your tree/series have been: * submitted under GPL v2 (or later) and include the Contributor's Signed-off-by, * posted to the relevant mailing list, * reviewed by you (or another maintainer of your subsystem tree), * successfully unit tested, and * destined for the current or next Linux merge window. Basically, this should be just what you would send to Linus (or ask him to fetch). It is allowed to be rebased if you deem it necessary. -- Cheers, Stephen Rothwell s...@canb.auug.org.au Legal Stuff: By participating in linux-next, your subsystem tree contributions are public and will be included in the linux-next trees. You may be sent e-mail messages indicating errors or other issues when the patches/commits from your subsystem tree are merged and tested in linux-next. These messages may also be cross-posted to the linux-next mailing list, the linux-kernel mailing list, etc. The linux-next tree project and IBM (my employer) make no warranties regarding the linux-next project, the testing procedures, the results, the e-mails, etc. If you don't agree to these ground rules, let me know and I'll remove your tree from participation in linux-next. pgpLabqIDVtHS.pgp Description: PGP signature
Re: linux-next: add utrace tree
On Wed, Jan 20, 2010 at 06:49:50AM +0100, Ingo Molnar wrote: Ingo, Note, i'm not yet convinced that this (and the rest: uprobes and systemtap, etc.) can go uptream in its present form. Agreed, uprobes is still not upstream ready -- it was an RFC. We are working through the comments there to get it ready for merger. IMHO the far more important thing to address beyond formalities and workflow cleanliness are the (many) technical observations and objections offered by Peter Zijstra on lkml. Not just the git history but also the abstractions and concepts are messy and should be reworked IMO, and also good and working perf events integration should be achieved, etc. I think Oleg addressed most of Peter's concerns on utrace when the ptrace/utrace patchset was reposted. Perf integration with uprobes will be done and discussions have started with Masami and Frederic. There are a couple of fundamental technical aspects (XOL vma vs. emulation; breakpoint insertion through CoW and not through quiesce) that need resolution. The fact that there's a well established upstream workflow for instrumentation patches, which is being routed around by the utrace/uprobes/systemtap code here is not a good sign in terms of reaching a good upstream solution. Lets hope it works out well though. Agreed. On the other hand, having ptrace/utrace in the -next tree will give it a lot more testing, while any outstanding technical issues are being addressed. Stephen, To exercise ptrace/utrace, it would be very useful if you pulled in git://git.kernel.org/pub/scm/linux/kernel/git/frob/linux-2.6-utrace.git branch utrace-ptrace instead of 'master'. Thanks, Ananth
Re: linux-next: add utrace tree
Hi Frank, On Wed, 20 Jan 2010 07:28:34 +0100 Ingo Molnar mi...@elte.hu wrote: Including experimental code that is RFC and which is not certain to go upstream is certainly not the purpose of linux-next though. Ingo is correct in what he says here. See the boilerplate: * destined for the current or next Linux merge window. Basically, this should be just what you would send to Linus (or ask him to fetch). I will remove this tree from linux-next tomorrow and wait until it is more ready for mainline inclusion. -- Cheers, Stephen Rothwells...@canb.auug.org.au http://www.canb.auug.org.au/~sfr/ pgp45X43xbpbG.pgp Description: PGP signature
Re: linux-next: add utrace tree
* Ingo Molnar mi...@elte.hu wrote: * Ananth N Mavinakayanahalli ana...@in.ibm.com wrote: On Wed, Jan 20, 2010 at 06:49:50AM +0100, Ingo Molnar wrote: Ingo, Note, i'm not yet convinced that this (and the rest: uprobes and systemtap, etc.) can go uptream in its present form. Agreed, uprobes is still not upstream ready -- it was an RFC. We are working through the comments there to get it ready for merger. IMHO the far more important thing to address beyond formalities and workflow cleanliness are the (many) technical observations and objections offered by Peter Zijstra on lkml. Not just the git history but also the abstractions and concepts are messy and should be reworked IMO, and also good and working perf events integration should be achieved, etc. I think Oleg addressed most of Peter's concerns on utrace when the ptrace/utrace patchset was reposted. Peter is Cc:-ed and he might want to chime in. Perf integration with uprobes will be done and discussions have started with Masami and Frederic. There are a couple of fundamental technical aspects (XOL vma vs. emulation; breakpoint insertion through CoW and not through quiesce) that need resolution. The fact that there's a well established upstream workflow for instrumentation patches, which is being routed around by the utrace/uprobes/systemtap code here is not a good sign in terms of reaching a good upstream solution. Lets hope it works out well though. Agreed. On the other hand, having ptrace/utrace in the -next tree will give it a lot more testing, while any outstanding technical issues are being addressed. Including experimental code that is RFC and which is not certain to go upstream is certainly not the purpose of linux-next though. It will cause conflicts with various other trees and increases the overhead all around. It also causes us to trust linux-next bugreports less - as it's not the 'next Linux' anymore. Also, there's virtually no high-level technical review done in linux-next: the trees are implicitly trusted (because they are pushed by maintainers), bugs and conflicts are reported but otherwise it's a neutral tree that includes pretty much any commit indiscriminately. If you need review and testing there's a number of trees you can get inclusion into. Btw., the utrace code has lived in -mm for quite some time - that's an excellent route as Andrew does thorough review and testing. If Andrew agrees with this particular tree as-is and wants these bits to live in linux-next and have it in -mm that way then that's a fair approach obviously and i have no objections ... The point is to have at least one relevant maintainer request and track it and then supervise the completion of it (which includes the resolution of all outstanding objections) and then push it to Linus. Ingo