Re: linux-next: add utrace tree

2010-02-08 Thread H. Peter Anvin

On 02/07/2010 10:54 PM, Pavel Machek wrote:


No, it has nothing to do with ring.  It has to do with modifying code
that another CPU could be executing at the same time, and with modifying
code on the same processor through another virtual alias (they are
different issues.)  The same issues apply regardless of the CPL of the
processor.


...but these are always 'there could be cpu bugs around' issues,
right? Like amd k6. AFAICT x86 always supported self-modifying code
without any extra barriers needed...



*Self*-modifying code, yes.  *Cross*-modifying code, no.

-hpa



Re: linux-next: add utrace tree

2010-02-08 Thread Arjan van de Ven
On Mon, 8 Feb 2010 07:54:25 +0100
Pavel Machek pa...@ucw.cz wrote:

  No, it has nothing to do with ring.  It has to do with modifying
  code that another CPU could be executing at the same time, and with
  modifying code on the same processor through another virtual alias
  (they are different issues.)  The same issues apply regardless of
  the CPL of the processor.
 
 ...but these are always 'there could be cpu bugs around' issues,
 right? Like amd k6. AFAICT x86 always supported self-modifying code
 without any extra barriers needed...

self modifying code yes, cross modifying code no.


-- 
Arjan van de VenIntel Open Source Technology Centre
For development, discussion and tips for power savings, 
visit http://www.lesswatts.org



Re: linux-next: add utrace tree

2010-02-08 Thread Avi Kivity

On 01/27/2010 01:05 PM, Ananth N Mavinakayanahalli wrote:

We don't need to write one. I don't know how easy it is to make the kvm
emulator less kvm-centric (vcpus, kvm_context, etc). Avi?
   


It's a lot of mindless work but not too difficult; replacing hardcoded 
accessors with function pointers.


--
error compiling committee.c: too many arguments to function



Re: linux-next: add utrace tree

2010-02-07 Thread Pavel Machek
Hi!

  Right, so you're going to love uprobes, which does exactly that. The
  current proposal is overwriting the target instruction with an INT3 and
  injecting an extra vma into the target process's address space
  containing the original instruction(s) and possible jumps back to the
  old code stream.
 
  Just out of interest, how does it handle the threading issue?
 
  Last I saw, at least some CPU people were _very_ nervous about overwriting 
  instructions if another CPU might be just about to execute them.
  
  I think the issue was that ring 0 was never meant to do that, where as,
  ring 3 does it all the time. Doesn't the dynamic library modify its
  text?
 
 No, it has nothing to do with ring.  It has to do with modifying code
 that another CPU could be executing at the same time, and with modifying
 code on the same processor through another virtual alias (they are
 different issues.)  The same issues apply regardless of the CPL of the
 processor.

...but these are always 'there could be cpu bugs around' issues,
right? Like amd k6. AFAICT x86 always supported self-modifying code
without any extra barriers needed...

-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) 
http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html



Re: linux-next: add utrace tree

2010-01-30 Thread Steven Rostedt
On Fri, 2010-01-29 at 08:42 +0100, Ingo Molnar wrote:
 * Ananth N Mavinakayanahalli ana...@in.ibm.com wrote:
 
  On Thu, Jan 28, 2010 at 09:55:02AM +0100, Ingo Molnar wrote:
  
  ...
  
   Lets compare the two cases via a drawing. Your current uprobes submission 
   does:
   
[kernel]  do probe thing single-step trap
  ^| ^  |
  |v |  v
[user] INT3XOL-ins  next ins-stream
   
( add the need for serialization to make sure the whole single-step 
   thing 
  does not get out of sync with reality. )
   
   And emulator approach would do:
   
[kernel]  emul-demux-fastpath, do probe thing
  ^ |
  | v
[user] INT3 next ins-stream
   
   far simpler conceptually, and faster as well, because it's one kernel 
   entry.
  
  Ingo,
  
  Yes, conceptually, emulation is simpler. In fact, it may even be the
  right thing to do from a housekeeping POV if gdb were enabled to use
  breakpoint assistance in the kernel. However... emulation is not
  easy. Just quoting Peter Anvin:
  
   On the more general rule of interpretation: I'm really concerned about
   having a bunch of partially-capable x86 interpreters all over the
   kernel.  x86 is *hard* to emulate, and it will only get harder as the
   architecture evolves.
  
 -hpa
 
 This is obviously true for a full emulator. Except for the fact that:
 
  Yes, I know you suggested we start with a small subset.
 
 and for the fact that we already have emulators in the kernel.

But this would be emulating userspace instructions, correct?

The kernel is limited to what instructions it can perform, no floating
point for example (of course there are some exceptions). But generally,
the instructions in the kernel should be easier to emulate than in
userspace. Userspace is free to do any wacky thing it wants. Will this
limit the ability to probe apps that take advantage of some strange op
code that the user knows is available on their platform?

-- Steve

 
 Plus we _already_ need to decode instructions for safe kprobing and have the 
 code for that upstream. So it's not like we can avoid decoding the 
 instructions. (and emulating certain instruction patterns is really just a 
 natural next step of a good decoder.)




Re: linux-next: add utrace tree

2010-01-29 Thread Ingo Molnar

* Ananth N Mavinakayanahalli ana...@in.ibm.com wrote:

 On Fri, Jan 29, 2010 at 08:39:07AM +0100, Ingo Molnar wrote:
 
 ...
 
  When we merged kprobes ~10 years ago we made the (rather bad) mistake of 
  merging a raw, opaque facility and leaving 'the rest' up to some other 
  entity. 
  IBM kprobes hackers vanished the day the original kprobes code went 
  upstream 
  and the high level entity never truly materialized in-kernel, for nearly a 
  decade!
 
 I don't know what you are referring to here... Kprobes was merged in 2.6.9 
 (~August 2004 -- less than 6 years ago). [...]

Ok, 6 years then :-)

 [...] Since then, we did work on ports to powerpc and s390. We implemented 
 kretprobes. We made it much scalable using RCU; we did the powerpc booster 
 to skip single-step when possible, not to mention various bug fixes over the 
 years.

Except it had no real in-kernel user.

 Yes, we did not do the perf integration, but perf did not exist then, 
 either.
 
 Its simply wrong to say people 'vanished'.

It has certainly was a bit stale for years - and with no real users that's 
certainly not a surprise. That has changed recently so i'm not complaining. We 
just dont want to repeat the same mistake with uprobes.

Ingo



Re: linux-next: add utrace tree

2010-01-29 Thread Ananth N Mavinakayanahalli
On Fri, Jan 29, 2010 at 10:11:16AM +0100, Ingo Molnar wrote:
 
 * Ananth N Mavinakayanahalli ana...@in.ibm.com wrote:
 
  On Fri, Jan 29, 2010 at 08:39:07AM +0100, Ingo Molnar wrote:
  
  ...
  
   When we merged kprobes ~10 years ago we made the (rather bad) mistake of 
   merging a raw, opaque facility and leaving 'the rest' up to some other 
   entity. 
   IBM kprobes hackers vanished the day the original kprobes code went 
   upstream 
   and the high level entity never truly materialized in-kernel, for nearly 
   a 
   decade!
  
  I don't know what you are referring to here... Kprobes was merged in 2.6.9 
  (~August 2004 -- less than 6 years ago). [...]
 
 Ok, 6 years then :-)

  [...] Since then, we did work on ports to powerpc and s390. We implemented 
  kretprobes. We made it much scalable using RCU; we did the powerpc booster 
  to skip single-step when possible, not to mention various bug fixes over 
  the 
  years.
 
 Except it had no real in-kernel user.

Not that I want to rebut you Ingo, but there were in-kernel users since 2006
(net/ipv4/tcp_probe.c) :-)

Aside, I am also glad that we have more flexibility with the perf
integration.

Ananth



Re: linux-next: add utrace tree

2010-01-29 Thread Ingo Molnar

* Ananth N Mavinakayanahalli ana...@in.ibm.com wrote:

 On Fri, Jan 29, 2010 at 10:11:16AM +0100, Ingo Molnar wrote:
  
  * Ananth N Mavinakayanahalli ana...@in.ibm.com wrote:
  
   On Fri, Jan 29, 2010 at 08:39:07AM +0100, Ingo Molnar wrote:
   
   ...
   
When we merged kprobes ~10 years ago we made the (rather bad) mistake 
of 
merging a raw, opaque facility and leaving 'the rest' up to some other 
entity. 
IBM kprobes hackers vanished the day the original kprobes code went 
upstream 
and the high level entity never truly materialized in-kernel, for 
nearly a 
decade!
   
   I don't know what you are referring to here... Kprobes was merged in 
   2.6.9 
   (~August 2004 -- less than 6 years ago). [...]
  
  Ok, 6 years then :-)
 
   [...] Since then, we did work on ports to powerpc and s390. We 
   implemented 
   kretprobes. We made it much scalable using RCU; we did the powerpc 
   booster 
   to skip single-step when possible, not to mention various bug fixes over 
   the 
   years.
  
  Except it had no real in-kernel user.
 
 Not that I want to rebut you Ingo, but there were in-kernel users since 2006
 (net/ipv4/tcp_probe.c) :-)

i said 'real' users. That usage in tcp_probe.c was (and is) really minimal and 
never expanded really.

 Aside, I am also glad that we have more flexibility with the perf 
 integration.

ok, good :)

Ingo



Re: linux-next: add utrace tree

2010-01-29 Thread Frank Ch. Eigler
Ingo Molnar mi...@elte.hu writes:

 [...] So, to sum it up: utrace XOL, which is rather complex already,
 needs even more complexity (which is not yet implemented) than the
 much simpler common-case emulator approach i outlined, just to break
 even with the performance of the much simpler approach. [...]

Is it an uncontroversial claim that emulation of CISC instructions
should perform better than their native execution, followed by an int3
(as in the simplest working scheme) or boosting (as done by kprobes)?
From my experience with simulators, simple software emulation of
cpus can be hundreds of times slower or worse than native execution.

- FChE



Re: linux-next: add utrace tree

2010-01-28 Thread Ingo Molnar

* Jim Keniston jkeni...@us.ibm.com wrote:

 On Wed, 2010-01-27 at 09:54 +0100, Ingo Molnar wrote:
 ...
  I think the best solution for user probes (by far) is to use a simplified 
  in-kernel instruction emulator for the few common probes instruction. 
  (Kprobes 
  already partially decodes x86 instructions to make it safe to apply 
  accelerated probes and there's other decoding logic in the kernel too.)
  
  The design and practical advantages are numerous:
  
   - People want to probe their function prologues most of the time ...
 a single INT3 there will in most cases just hit the initial stack 
 allocation and that's it.
 
 Yes, emulating push %ebp would buy us a lot of coverage for a lot of apps 
 on x86 (but see below**). [...]

Coverage in practice is all that matters.

Consider the fact that i get 1000 times more bugreports aided by strace, which 
has 1000 times more overhead than even the slowest of uprobes approaches.

This simple fact tell us that while performance matters, it is of little use 
if good utility and a clean design is not there. (in fact sane and clean 
design will almost automatically result in good performance too down the line, 
but i digress.) Faster crap is still crap.

 [...]  Even there, though, we'd have to address the page fault we'd 
 occasionally get when extending the stack vma.

Nope, in the simplest model not even page fault emulation is needed, 
get_user()/put_user() would resolve it automatically. If you either get the 
value with the pagefault resolved, or you get a -EFAULT.

If you concentrate only on the common case then emulation can be _really_ 
simple.

Lets compare the two cases via a drawing. Your current uprobes submission 
does:

 [kernel]  do probe thing single-step trap
   ^| ^  |
   |v |  v
 [user] INT3XOL-ins  next ins-stream

 ( add the need for serialization to make sure the whole single-step thing 
   does not get out of sync with reality. )

And emulator approach would do:

 [kernel]  emul-demux-fastpath, do probe thing
   ^ |
   | v
 [user] INT3 next ins-stream

far simpler conceptually, and faster as well, because it's one kernel entry.

Generally i get nervous if a piece of instrumentation cannot be expressed in 
simple ways. _Especially_ if i consider it to concentrate on all the wrong 
things and doesnt even break even with a far less complex scheme.

What would be the 'right things' to concentrate on? Make sure it's all all 
around end-to-end package that is _useful to people_. As of today i have yet 
to get a _single_ bugreport or kernel improvement requested by an application 
writer who found out about the inefficiencies in his app using uprobes. There 
is a gaping hole of utility here, a whole cathedral of tools written that just 
a handful of ordinary Linux person uses. There's big disconnect and i can say 
one thing for sure: needless complexity in the wrong places can outright 
stiffle tools from becoming good.

  We could get quite good coverage (and very fast 
 emulation) for the common case in not too much code - and much of that 
  code 
 we already have available. No re-trapping,
 
 As previously discussed, boosting would also get rid of the single-step trap 
 for most instructions.

Boosting is not in the uprobes patch-set you submitted. Even with it present 
it wont get rid of the initial INT3. So basically _best-case_ (with boosting) 
XOL-uprobes could roughly break even with a pure emulator approach ...

That's a big and fundamental difference.

  no extra instruction patching 
 
 x86_64 rip-relative instructions are the only ones we alter.
 
 and complex maintenance of trampolines.
  
   - It's as transparent as it gets - no user-space trampoline or other 
  visible
 state that modifies behavior or can be stomped upon by user-space bugs.
 
 The XOL vma isn't writable from user space, so I can't think of how it could 
 be clobbered merely by a stray memory reference. [...]

Well there must be some purpose to the instrumentation, there must be some way 
to save data, right? If yes and it's in user-space, that data is clobberable. 
If it's in kernel-space then we have to enter the kernel anyway (with similar 
cost patterns to an INT3 entry) - so we just delayed the kernel entry.

So IMHO you have designed in considerable complexity for little immediate 
benefit.

 [...]  Yes, it's a vma that the unprobed app would never have; and yes, a 
 malicious app or kernel module could remove it or alter the protection and 
 scribble on it.  We don't try to defend the app against such malicious 
 attacks, but we do our best to ensure that the kernel side handles such 
 attacks gracefully.
 
   - Lightweight and simple probe insertion: no weird setup sequence needing 
  the 

Re: linux-next: add utrace tree

2010-01-28 Thread Benjamin Herrenschmidt
On Mon, 2010-01-25 at 08:52 -0800, Linus Torvalds wrote:
 
 That said, I also suspect that people should still look seriously at 
 simply just improving ptrace. For example, I suspect that the biggest 
 problem with ptrace is really just the signalling, and that creating a
 new 
 extension for JUST THAT, and then having a model where you can choose
 - at 
 PTRACE_ATTACH time - how to wait for events would be a good thing.

like returning a fd to poll() on ? :-)

Cheers,
Ben.




Re: linux-next: add utrace tree

2010-01-28 Thread Linus Torvalds


On Fri, 29 Jan 2010, Benjamin Herrenschmidt wrote:
 
 like returning a fd to poll() on ? :-)

Well, there's the possibility of async polling (rather than the 
synchronous wait that ptrace forces now), but there are other advantages 
to having a connection model - like not having to look up the child 
process every time like ptrace does now.

Although 'find_task_by_vpid()' is probably cheap enough that nobody really 
cares. We do a fair job at those hash tables.

Linus



Re: linux-next: add utrace tree

2010-01-28 Thread Jim Keniston
On Thu, 2010-01-28 at 09:55 +0100, Ingo Molnar wrote:
 * Jim Keniston jkeni...@us.ibm.com wrote:
 
  On Wed, 2010-01-27 at 09:54 +0100, Ingo Molnar wrote:
  ...
  
  Yes, emulating push %ebp would buy us a lot of coverage for a lot of apps 
  on x86 (but see below**). [...]
 
...
 
  [...]  Even there, though, we'd have to address the page fault we'd 
  occasionally get when extending the stack vma.
 
 Nope, in the simplest model not even page fault emulation is needed, 
 get_user()/put_user() would resolve it automatically. If you either get the 
 value with the pagefault resolved, or you get a -EFAULT.

get_user()/put_user() have to be done in a context where you can sleep,
right?  Uprobes currently operates in such contexts, but there's some
talk of moving it all to a DIE_INT3 notifier context, where it can't
sleep.

...

 
   We could get quite good coverage (and very fast 
  emulation) for the common case in not too much code - and much of that 
   code 
  we already have available. No re-trapping,
  
  As previously discussed, boosting would also get rid of the single-step 
  trap 
  for most instructions.
 
 Boosting is not in the uprobes patch-set you submitted. Even with it present 
 it wont get rid of the initial INT3. So basically _best-case_ (with boosting) 
 XOL-uprobes could roughly break even with a pure emulator approach ...
 
 That's a big and fundamental difference.

To be fair, wrt uprobes, emulation and boosting are both in the same
state: pretty well understood, but not yet implemented.

...
   
- It's as transparent as it gets - no user-space trampoline or other 
   visible
  state that modifies behavior or can be stomped upon by user-space bugs.
  
  The XOL vma isn't writable from user space, so I can't think of how it 
  could 
  be clobbered merely by a stray memory reference. [...]
 
 Well there must be some purpose to the instrumentation, there must be some 
 way 
 to save data, right? If yes and it's in user-space, that data is clobberable.

One or two others have advocated an approach (which eliminates the
breakpoint trap) where trace data is stored in the uprobe vma, but I
haven't.  (In such a case, XOL vma would be a misnomer.)  I agree that
in such a scenario, the uprobe vma would of necessity be writable by the
app.

  
 If it's in kernel-space then we have to enter the kernel anyway (with similar 
 cost patterns to an INT3 entry) - so we just delayed the kernel entry.

This seems to presume that you have to extract trace data from the
kernel every time a probe is hit.  In actual practice, you're often just
checking for unusual arg values, incrementing a counter, or some such.

 
...
  Even if we add emulation, it seems sensible to keep the XOL approach as a 
  backup to handle instructions that aren't yet emulated (and architectures 
  that don't yet have emulators).  That way, if you don't probe any 
  unemulated 
  instructions, the XOL vma is never created.
 
 To turn the argument around: an in-kernel emulator is an all-around facility 
 to make sure we probe safely and securely, _and_ it is also more portable 
 because it's simpler (because more gradual) to implement on a new 
 architecture 
 as you dont actually have to copy around instructions (and make sure they 
 work 
 in that new place), but have to emulate a limited subset of the instruction 
 space, on purely local state.

I understand the desire to start small and simple and grow gradually
from there.  We thought we were doing that.  Single-stepping out of line
has been in use for close to a decade, maybe more; and boosting (in
kprobes) has been around for a few years as well.  To the *probes folks,
it feels pretty solid.

 
...
 
 With an emulator (assuming the emulator is correct) we can execute the 
 precise 
 semantics of that instruction in that place - without any side-effects from 
 trampolining/replacement.

And of course, our view has been that the best way to achieve the effect
of the instruction, including all desired side-effects, is to execute
the instruction on the CPU.

...
  
  **In practice, we've had to probe all sorts of instructions, including FP 
  instructions -- especially where you want to exploit the debug info to get 
  the names, types, and locations of variables and args.  For some compilers 
  and architectures, the debug info isn't reliable until the end of the 
  function prologue, at which point you could find any old instruction.  
  Ditto 
  if you want to probe statements within a function.
 
 For those cases, frankly, the right approach is to fix the debug info (or 
 introduce a new one) and forget the old crap.
 
 You treat debuginfo as some god-given property, while it's one of the 
 suckiest 
 aspects of all of Linux. But we've had that discussion months (and years) 
 ago. 
 It has improved in gcc 4.5 so there's some hope.

Yes, there seems to be considerable movement toward better debug info --
which could make statement probing (and not just 

Re: linux-next: add utrace tree

2010-01-28 Thread Ananth N Mavinakayanahalli
On Thu, Jan 28, 2010 at 09:55:02AM +0100, Ingo Molnar wrote:

...

 Lets compare the two cases via a drawing. Your current uprobes submission 
 does:
 
  [kernel]  do probe thing single-step trap
^| ^  |
|v |  v
  [user] INT3XOL-ins  next ins-stream
 
  ( add the need for serialization to make sure the whole single-step thing 
does not get out of sync with reality. )
 
 And emulator approach would do:
 
  [kernel]  emul-demux-fastpath, do probe thing
^ |
| v
  [user] INT3 next ins-stream
 
 far simpler conceptually, and faster as well, because it's one kernel entry.

Ingo,

Yes, conceptually, emulation is simpler. In fact, it may even be the
right thing to do from a housekeeping POV if gdb were enabled to use
breakpoint assistance in the kernel. However... emulation is not
easy. Just quoting Peter Anvin:

 On the more general rule of interpretation: I'm really concerned about
 having a bunch of partially-capable x86 interpreters all over the
 kernel.  x86 is *hard* to emulate, and it will only get harder as the
 architecture evolves.

   -hpa

Yes, I know you suggested we start with a small subset.

We already have an implementation of instruction emulation in kernel for
x86 and powerpc, but its too KVM centric. If there is a generic
emulation layer, we would use it.

There are conflicting opinions for either case; complicated as it is,
the XOL scheme works and, to a large extent, it is easily extendable to
other architectures compared to the emulation approach. Uprobes can be
made to use emulation when possible/available, but I don't think this
should be gating decision for the initial implementation of the feature.

Ananth



Re: linux-next: add utrace tree

2010-01-27 Thread Ingo Molnar

* Peter Zijlstra pet...@infradead.org wrote:

 On Tue, 2010-01-26 at 15:37 -0800, Linus Torvalds wrote:
  
  On Tue, 26 Jan 2010, Tom Tromey wrote:
   
   In non-stop mode (where you can stop one thread but leave the others
   running), gdb wants to have the breakpoints always inserted.  So,
   something must emulate the displaced instruction.
  
  I'm almost totally uninterested in breakpoints that actually re-write 
  instructions. It's impossible to do that efficiently and well, especially 
  in threaded environments.
  
  So if you do instruction rewriting, I can only say that's your problem.
 
 Right, so you're going to love uprobes, which does exactly that. The current 
 proposal is overwriting the target instruction with an INT3 and injecting an 
 extra vma into the target process's address space containing the original 
 instruction(s) and possible jumps back to the old code stream.
 
 I'm all in favor of not doing that extra vma and instead use stack or TLS 
 space, but then people complain about having to make that executable (which 
 is something I don't really mind, x86 had executable everything for very 
 long, and also, its only so when debugging the thing anyway).

I think the best solution for user probes (by far) is to use a simplified 
in-kernel instruction emulator for the few common probes instruction. (Kprobes 
already partially decodes x86 instructions to make it safe to apply 
accelerated probes and there's other decoding logic in the kernel too.)

The design and practical advantages are numerous:

 - People want to probe their function prologues most of the time ...
   a single INT3 there will in most cases just hit the initial stack 
   allocation and that's it. We could get quite good coverage (and very fast 
   emulation) for the common case in not too much code - and much of that code 
   we already have available. No re-trapping, no extra instruction patching 
   and complex maintenance of trampolines.

 - It's as transparent as it gets - no user-space trampoline or other visible
   state that modifies behavior or can be stomped upon by user-space bugs.

 - Lightweight and simple probe insertion: no weird setup sequence needing the 
   stopping of all tasks to install the trampoline. We just add the INT3 and 
   off you go.

 - Emulation is evidently thread-safe, SMP-safe, etc. as it only acts on 
   task local state.

 - The points we can probe are never truly limited as it's all freely
   upscalable: if you cannot probe an instruction you want to probe today,
   extend the emulator. Deny the rest. _All_ versions of uprobes code i've
   seen so far already restricts the probe-compatible instruction set:
   RIP-relative instructions are excluded on 64-bit for example.

 - Emulation has the _least_ semantical side effects as we really execute
   'that' instruction - not some other instruction put elsewhere into a
   special vma or into the process/thread stack, or some special in-kernel
   trampoline, etc.

 - Emulation can be very fast for the common case as well. Nobody will probe
   weird, complex instructions. They will use 'perf probe' to insert probes
   into their functions 90% of the time ...

 - FPU and complex ops and pagefault emulation is not really what i'd expect
   to be necessary for simple probing - but it _can_ be added by people who
   care about it, if they so wish.

Such a scheme would be _far_ more preferable form a maintenance POV as well, 
as the initial code will be small, and we can extend it gradually. All the 
other proposals are complex 'all or nothing' schemes with no flexibility for 
complexity at all.

Thanks,

Ingo



Re: linux-next: add utrace tree

2010-01-27 Thread Linus Torvalds


On Wed, 27 Jan 2010, Peter Zijlstra wrote:
 
 Right, so you're going to love uprobes, which does exactly that. The
 current proposal is overwriting the target instruction with an INT3 and
 injecting an extra vma into the target process's address space
 containing the original instruction(s) and possible jumps back to the
 old code stream.

Just out of interest, how does it handle the threading issue?

Last I saw, at least some CPU people were _very_ nervous about overwriting 
instructions if another CPU might be just about to execute them.

Even the overwrite only the first byte with 'int3' made them go umm, I 
need to talk to some core CPU people to see if that's ok. They mumble 
about possible CPU errata, I$ coherency, instruction retry etc.

I realize kprobes does this very thing, but kprobes is esoteric stuff and 
doesn't have much choice. In user space, you _could_ do the modification 
on a different physical page and then just switch the page table entry 
instead, and not get into the whole D$/I$ coherency thing at all.

Linus



Re: linux-next: add utrace tree

2010-01-27 Thread Peter Zijlstra
On Wed, 2010-01-27 at 02:43 -0800, Linus Torvalds wrote:
 
 On Wed, 27 Jan 2010, Peter Zijlstra wrote:
  
  Right, so you're going to love uprobes, which does exactly that. The
  current proposal is overwriting the target instruction with an INT3 and
  injecting an extra vma into the target process's address space
  containing the original instruction(s) and possible jumps back to the
  old code stream.
 
 Just out of interest, how does it handle the threading issue?
 
 Last I saw, at least some CPU people were _very_ nervous about overwriting 
 instructions if another CPU might be just about to execute them.
 
 Even the overwrite only the first byte with 'int3' made them go umm, I 
 need to talk to some core CPU people to see if that's ok. They mumble 
 about possible CPU errata, I$ coherency, instruction retry etc.
 
 I realize kprobes does this very thing, but kprobes is esoteric stuff and 
 doesn't have much choice. In user space, you _could_ do the modification 
 on a different physical page and then just switch the page table entry 
 instead, and not get into the whole D$/I$ coherency thing at all.

Right, so there's two aspects:

 1) concurrency when inserting the probe
 2) concurrency when hitting the probe

1) used to be dealt with by using utrace to stop all threads in the
process and then writing the instruction. I suggested to CoW the page,
modify the instruction, set the pagetable and flush tlbs at full speed
-- the very thing you suggest here.

2) so traditionally (and the intel arch manual describes this) is to
replace the instruction, single step it, and write the probe back. This
is racy for multi-threading. The current uprobes stuff solves this by
doing single-step-out-of-line (XOL).

XOL injects a new vma into the target process and puts the old
instruction there, then it single steps on the new location, leaving the
original site with INT3.

This doesn't work for things like RIP relative instructions, so uprobes
considers them un-probable.

Also, I myself really object to inserting a vma in a running process,
its like a land-lord, sure he has the key but he won't come in an poke
through your things.

The alternative is to place the instruction in TLS or stack space, since
each thread can only have a single trap at a time, you only need space
for 1 instruction (plus a possible jump out to the original site). There
is the 'problem' of marking the TLS/stack executable when being probed.

Then there is the whole emulation angle, the uprobes people basically
say its too much effort to write a x86 emulator.



Re: linux-next: add utrace tree

2010-01-27 Thread Peter Zijlstra
On Wed, 2010-01-27 at 11:55 +0100, Peter Zijlstra wrote:
 Right, so there's two aspects:
 
  1) concurrency when inserting the probe
  2) concurrency when hitting the probe
 
 1) used to be dealt with by using utrace to stop all threads in the
 process and then writing the instruction. I suggested to CoW the page,
 modify the instruction, set the pagetable and flush tlbs at full speed
 -- the very thing you suggest here. 

Also, since executable maps are typically MAP_PRIVATE, you have to CoW
anyway in order to modify it and I would exclude MAP_SHARED from being
probable because then the modification could seep through into whatever
was backing that thing.



Re: linux-next: add utrace tree

2010-01-27 Thread Ananth N Mavinakayanahalli
On Wed, Jan 27, 2010 at 11:55:16AM +0100, Peter Zijlstra wrote:
 On Wed, 2010-01-27 at 02:43 -0800, Linus Torvalds wrote:
  
  On Wed, 27 Jan 2010, Peter Zijlstra wrote:
   
   Right, so you're going to love uprobes, which does exactly that. The
   current proposal is overwriting the target instruction with an INT3 and
   injecting an extra vma into the target process's address space
   containing the original instruction(s) and possible jumps back to the
   old code stream.
  
  Just out of interest, how does it handle the threading issue?
  
  Last I saw, at least some CPU people were _very_ nervous about overwriting 
  instructions if another CPU might be just about to execute them.
  
  Even the overwrite only the first byte with 'int3' made them go umm, I 
  need to talk to some core CPU people to see if that's ok. They mumble 
  about possible CPU errata, I$ coherency, instruction retry etc.
  
  I realize kprobes does this very thing, but kprobes is esoteric stuff and 
  doesn't have much choice. In user space, you _could_ do the modification 
  on a different physical page and then just switch the page table entry 
  instead, and not get into the whole D$/I$ coherency thing at all.
 
 Right, so there's two aspects:
 
  1) concurrency when inserting the probe
  2) concurrency when hitting the probe
 
 1) used to be dealt with by using utrace to stop all threads in the
 process and then writing the instruction. I suggested to CoW the page,
 modify the instruction, set the pagetable and flush tlbs at full speed
 -- the very thing you suggest here.
 
 2) so traditionally (and the intel arch manual describes this) is to
 replace the instruction, single step it, and write the probe back. This
 is racy for multi-threading. The current uprobes stuff solves this by
 doing single-step-out-of-line (XOL).
 
 XOL injects a new vma into the target process and puts the old
 instruction there, then it single steps on the new location, leaving the
 original site with INT3.
 
 This doesn't work for things like RIP relative instructions, so uprobes
 considers them un-probable.

Probing RIP-relative instructions work just fine; there are fixups that
take care of it.

 Also, I myself really object to inserting a vma in a running process,
 its like a land-lord, sure he has the key but he won't come in an poke
 through your things.
 
 The alternative is to place the instruction in TLS or stack space, since
 each thread can only have a single trap at a time, you only need space
 for 1 instruction (plus a possible jump out to the original site). There
 is the 'problem' of marking the TLS/stack executable when being probed.
 
 Then there is the whole emulation angle, the uprobes people basically
 say its too much effort to write a x86 emulator.

We don't need to write one. I don't know how easy it is to make the kvm
emulator less kvm-centric (vcpus, kvm_context, etc). Avi?

Ananth 



Re: linux-next: add utrace tree

2010-01-27 Thread Linus Torvalds


On Wed, 27 Jan 2010, Peter Zijlstra wrote:
 
 Right, so there's two aspects:
 
  1) concurrency when inserting the probe

That's the one I worried about. Stopping all threads will fix it, 
obviously at a disastrous performance cost, but what do I care? As noted, 
there are ways to do it safely with TLB switching, so it's fixable.

  2) concurrency when hitting the probe

Yeah, I didn't worry about this part, since the only solution is the 
out-of-line one, and I don't much care how the memory gets allocated for 
it. Inserting a whole new vma seems pretty drastic, but compared to 
stopping all threads, it's a small thing.

Linus



Re: linux-next: add utrace tree

2010-01-27 Thread Peter Zijlstra
On Wed, 2010-01-27 at 16:35 +0530, Ananth N Mavinakayanahalli wrote:
 Probing RIP-relative instructions work just fine; there are fixups that
 take care of it. 

Ah my bad then, it was my understanding you simply bailed on those.

Just for my information, how large are the replacement sequences?



Re: linux-next: add utrace tree

2010-01-27 Thread Ananth N Mavinakayanahalli
On Wed, Jan 27, 2010 at 12:08:31PM +0100, Peter Zijlstra wrote:
 On Wed, 2010-01-27 at 16:35 +0530, Ananth N Mavinakayanahalli wrote:
  Probing RIP-relative instructions work just fine; there are fixups that
  take care of it. 
 
 Ah my bad then, it was my understanding you simply bailed on those.
 
 Just for my information, how large are the replacement sequences?

The RIP relative instruction is transformed into indirect addressing
mode using a scratch register.

For details http://marc.info/?l=linux-kernelm=126401936114639w=2. 

Ananth



Re: linux-next: add utrace tree

2010-01-27 Thread Steven Rostedt
[ Added Arjan ]

On Wed, 2010-01-27 at 02:43 -0800, Linus Torvalds wrote:
 
 On Wed, 27 Jan 2010, Peter Zijlstra wrote:
  
  Right, so you're going to love uprobes, which does exactly that. The
  current proposal is overwriting the target instruction with an INT3 and
  injecting an extra vma into the target process's address space
  containing the original instruction(s) and possible jumps back to the
  old code stream.
 
 Just out of interest, how does it handle the threading issue?
 
 Last I saw, at least some CPU people were _very_ nervous about overwriting 
 instructions if another CPU might be just about to execute them.

I think the issue was that ring 0 was never meant to do that, where as,
ring 3 does it all the time. Doesn't the dynamic library modify its
text?

-- Steve

 
 Even the overwrite only the first byte with 'int3' made them go umm, I 
 need to talk to some core CPU people to see if that's ok. They mumble 
 about possible CPU errata, I$ coherency, instruction retry etc.
 
 I realize kprobes does this very thing, but kprobes is esoteric stuff and 
 doesn't have much choice. In user space, you _could_ do the modification 
 on a different physical page and then just switch the page table entry 
 instead, and not get into the whole D$/I$ coherency thing at all.
 
   Linus




Re: linux-next: add utrace tree

2010-01-27 Thread H. Peter Anvin
On 01/27/2010 02:43 AM, Linus Torvalds wrote:
 
 
 On Wed, 27 Jan 2010, Peter Zijlstra wrote:

 Right, so you're going to love uprobes, which does exactly that. The
 current proposal is overwriting the target instruction with an INT3 and
 injecting an extra vma into the target process's address space
 containing the original instruction(s) and possible jumps back to the
 old code stream.
 
 Just out of interest, how does it handle the threading issue?
 
 Last I saw, at least some CPU people were _very_ nervous about overwriting 
 instructions if another CPU might be just about to execute them.
 
 Even the overwrite only the first byte with 'int3' made them go umm, I 
 need to talk to some core CPU people to see if that's ok. They mumble 
 about possible CPU errata, I$ coherency, instruction retry etc.
 

We actually went through a review of that here at Intel.  We do not yet
have an *official* answer (in order for us to have that we have to have
it approved by the architecture committee and published in the SDM), but
to the best of our current knowledge (and I'm allowed to say this) the
int3 method followed by global IPIs should be safe for modifying *one
(atomic) instruction*.  This is a specific case of a more general rule,
but I don't want to disclose the whole rule until it has been officially
approved.

 I realize kprobes does this very thing, but kprobes is esoteric stuff and 
 doesn't have much choice. In user space, you _could_ do the modification 
 on a different physical page and then just switch the page table entry 
 instead, and not get into the whole D$/I$ coherency thing at all.

On the more general rule of interpretation: I'm really concerned about
having a bunch of partially-capable x86 interpreters all over the
kernel.  x86 is *hard* to emulate, and it will only get harder as the
architecture evolves.

-hpa



Re: linux-next: add utrace tree

2010-01-27 Thread Jim Keniston
On Wed, 2010-01-27 at 09:54 +0100, Ingo Molnar wrote:
...
 I think the best solution for user probes (by far) is to use a simplified 
 in-kernel instruction emulator for the few common probes instruction. 
 (Kprobes 
 already partially decodes x86 instructions to make it safe to apply 
 accelerated probes and there's other decoding logic in the kernel too.)
 
 The design and practical advantages are numerous:
 
  - People want to probe their function prologues most of the time ...
a single INT3 there will in most cases just hit the initial stack 
allocation and that's it.

Yes, emulating push %ebp would buy us a lot of coverage for a lot of
apps on x86 (but see below**).  Even there, though, we'd have to address
the page fault we'd occasionally get when extending the stack vma.

 We could get quite good coverage (and very fast 
emulation) for the common case in not too much code - and much of that 
 code 
we already have available. No re-trapping,

As previously discussed, boosting would also get rid of the single-step
trap for most instructions.

 no extra instruction patching 

x86_64 rip-relative instructions are the only ones we alter.

and complex maintenance of trampolines.
 
  - It's as transparent as it gets - no user-space trampoline or other visible
state that modifies behavior or can be stomped upon by user-space bugs.

The XOL vma isn't writable from user space, so I can't think of how it
could be clobbered merely by a stray memory reference.  Yes, it's a vma
that the unprobed app would never have; and yes, a malicious app or
kernel module could remove it or alter the protection and scribble on
it.  We don't try to defend the app against such malicious attacks, but
we do our best to ensure that the kernel side handles such attacks
gracefully.

 
  - Lightweight and simple probe insertion: no weird setup sequence needing 
 the 
stopping of all tasks to install the trampoline. We just add the INT3 and 
off you go.

FWIW, we don't stop all threads to set up or extend the XOL vma, which
is typically a one-time event.  We just grab a mutex, in case multiple
threads hit previously-unhit probepoints simultaneously, and
simultaneously decide that the XOL area needs to be created or extended.

 
  - Emulation is evidently thread-safe, SMP-safe, etc. as it only acts on 
task local state.

The posted uprobes implementation is, so far as we can tell through code
inspection and testing, also thread-safe and SMP-safe.

 
  - The points we can probe are never truly limited as it's all freely
upscalable: if you cannot probe an instruction you want to probe today,
extend the emulator.

I don't see how ripping out existing support for almost* the entire
instruction set, and then putting it back instruction by instruction,
patch by patch, is a win.

Even if we add emulation, it seems sensible to keep the XOL approach as
a backup to handle instructions that aren't yet emulated (and
architectures that don't yet have emulators).  That way, if you don't
probe any unemulated instructions, the XOL vma is never created.

 Deny the rest. _All_ versions of uprobes code i've
seen so far already restricts the probe-compatible instruction set:

*Yes, we currently decline to probe some instructions that look
troublesome and we haven't taken the time to test.  These include things
like privileged instructions, int*, in*/out*, and instructions that fuss
with the segment registers.  We've never actually seen such instructions
in user apps.

RIP-relative instructions are excluded on 64-bit for example.

No.  As discussed in previous posts, we handle rip-relative
instructions.

 
  - Emulation has the _least_ semantical side effects as we really execute
'that' instruction -

It seems to me that emulation is the only approach that DOESN'T execute
the probed instruction.

 not some other instruction put elsewhere into a
special vma or into the process/thread stack, or some special in-kernel
trampoline, etc.
 
  - Emulation can be very fast for the common case as well. Nobody will probe
weird, complex instructions. They will use 'perf probe' to insert probes
into their functions 90% of the time ...
 
  - FPU and complex ops and pagefault emulation is not really what i'd expect
to be necessary for simple probing - but it _can_ be added by people who
care about it, if they so wish.

**In practice, we've had to probe all sorts of instructions, including
FP instructions -- especially where you want to exploit the debug info
to get the names, types, and locations of variables and args.  For some
compilers and architectures, the debug info isn't reliable until the end
of the function prologue, at which point you could find any old
instruction.  Ditto if you want to probe statements within a function.

 
 Such a scheme would be _far_ more preferable form a maintenance POV as well, 
 as the initial code will be small, and we can extend it gradually. All the 
 

Re: linux-next: add utrace tree

2010-01-26 Thread Pavel Machek
On Fri 2010-01-22 08:43:18, valdis.kletni...@vt.edu wrote:
 On Fri, 22 Jan 2010 10:51:39 +0530, Ananth N Mavinakayanahalli said:
 
  FWIW, Oleg's implementation of ptrace over utrace is 100% compatible
  with legacy ptrace; gdb testsuite indicates that
  (http://lkml.org/lkml/2009/12/21/98).
 
 No, that only proves it's compatible enough for gdb to not care. The problem
 is all those *other* packages that abuse ptrace in totally crackhead ways.
 
 (No, I can't name them - but ptrace is the sort of interface that almost
 encourages its use for things somewhere between crackhead and mad-scientist,
 so they're almost certainly out there.. WAY out there.. :)

strace, subterfugue, ltrace, ...? Plus various homegrown sandboxing tools...
Pavel


-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) 
http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html



Re: linux-next: add utrace tree

2010-01-26 Thread Ananth N Mavinakayanahalli
On Mon, Jan 25, 2010 at 01:41:57PM -0800, Linus Torvalds wrote:
 
 
 On Mon, 25 Jan 2010, Tom Tromey wrote:

...

  * Support displaced stepping in the kernel; I think this would improve
performance when debugging in non-stop mode.
 
 Don't we already do that at least on x86? Just doing a single-step should 
 work on an instruction even if it has a breakpoint on it, because we set 
 the TF bit.
 
 Or maybe I'm not understanding what displaced stepping means to you.

If Tom is referring to supporting single-stepping out of line, ie., not
putting back the original instruction at the bp location, yes, we
already support it on various architectures for kernel breakpoints,
through the kprobes infrastructure.

For userspace, there are more complications to take care of. We are
reworking a prototype based on community comments (see the long UBP/XOL
thread on lkml from a few days ago). Hopefully the userspace breakpoint
assistance layer will be generic enough for gdb to also take advantage
of, though the interface details need to be hashed out.

Ananth



Re: linux-next: add utrace tree

2010-01-26 Thread Frank Ch. Eigler
Hi -

On Mon, Jan 25, 2010 at 02:05:54PM -0700, Tom Tromey wrote:
 [...]
 Nevertheless, if the Linux kernel were to present a new user-space API,
 and if it had an advantage over ptrace, then we would port GDB to use
 it.  There are other platforms where, IIRC, we now use some /proc thing
 instead of ptrace.
 
 There are definitely things we would like from such an API.  Here's a
 few I can think of immediately, there are probably others.
 
 * Use an fd, not SIGCHLD+wait, to report inferior state changes to gdb.
   [...] Relatedly, don't mess with the inferior's parentage.

This is satisfied by the gdbstub prototype.

 * Support displaced stepping in the kernel [...]

I believe this is tantamount to hardware breakpoint support, which is
already present (via optional uprobes).

 * Support some kind of breakpoint expression in the kernel; this would
   improve performance of conditional breakpoints.  Perhaps the existing
   gdb agent expressions could be used.

This is in the todo list.


And that KILLER FEATURE of running strace plus gdb on the same
process?  It *already works* with the gdbstub, and unmodified strace +
gdb, thanks to utrace multiplexing process control.  It is still
artificially restricted in many ways, but this sort of thing is ready
for testing:

% process 
[1] 
% strace -o FILE -p  
% gdb process
(gdb) target remote /proc//gdb
(gdb) backtrace 
(gdb) cont
(gdb) ^D
%
[process continues]
% cat FILE
[...]
% kill 


- FChE



Re: linux-next: add utrace tree

2010-01-26 Thread Johannes Stezenbach
On Mon, Jan 25, 2010 at 04:07:21PM -0800, Linus Torvalds wrote:
 On Tue, 26 Jan 2010, Renzo Davoli wrote:
  
  The solution is that everybody can code his/her optimized kernel/user 
  interface for tracing in his/her kernel module, i.e. utrace.
 
 I don't think people understand. That is simply not a solution. That is 
 a PROBLEM. The thing you describe is an absolute disaster. Which is 
 exactly why I rant against it.
 
 The last thing we want to have is here, take this, and make your own 
 kernel module mess around it optimized for your particular crazy 
 scenario.
 
 But every SINGLE post in this thread that has argued for utrace has argued 
 exactly this way. 

I haven't followed much of the utrace discussions, but my impression was
that utrace primarily is a cleanup effort, replacing don't change it,
you might break it code with a clean, well defined (and even documented)
implementation.  To make it easier for people not familiar
with the low-level architecture details to experiment with
debugging stuff.

Two points to consider:

1. If you'd merge utrace + ptrace-on-utrace, but never anything else
   which uses the utrace API, wouldn't it still be an improvement?

2. A well defined utrace API makes debugging code more hackable, thus more
   likely that someone might come up with a brilliant killer debug
   feature in the future. (This might sound lame, but there are already
   a few people doing crazy things with utrace while I'm not aware
   that people have done such experiments based on the current ptrace impl.)

BTW, the ptrace improvements discussed elsewhere in this thread
(like using an fd intead of signals/wait) are orthogonal
to utrace, no?  IMHO it's a seperate discussion.


Johannes



Re: linux-next: add utrace tree

2010-01-26 Thread Linus Torvalds


On Tue, 26 Jan 2010, Johannes Stezenbach wrote:
 
 1. If you'd merge utrace + ptrace-on-utrace, but never anything else
which uses the utrace API, wouldn't it still be an improvement?

I already said earlier that I'd be perfectly happy to merge utrace code, 
as long as it was clear that I'm not merging a platform for crazy work. 
IOW, the end result might be merging 99% of the code, but I want to set 
peoples _expectations_ right. I'm not at all interested in merging stuff 
that has various exported helper functions for people doing random things, 
but I could happily merge stuff that cleans up internal implementation.

 2. A well defined utrace API makes debugging code more hackable, thus more
likely that someone might come up with a brilliant killer debug
feature in the future.

I don't really agree. 

Clean code makes things easier to improve, and maybe utrace cleans thigns 
up. But defining new API's makes me very worried, and quite frankly, the 
last thing I ever want to see is a new interface that out-of-tree modules 
starr using for random hacking.

So I'd be much happier without the whole utrace kernel interface and 
callbacks, and very much would want to avoid the whole issue of plugins. 
I'd like to see ptrace improvements - not something else.

In other words, I'd much much rather keep the utrace thing _internal_ to 
ptrace. If people have performance complaints about ptrace, let's look at 
fixing those _as_such_, rather than look at new modules etc.

 BTW, the ptrace improvements discussed elsewhere in this thread
 (like using an fd intead of signals/wait) are orthogonal
 to utrace, no?  IMHO it's a seperate discussion.

Largely, yes. Tied together to some degree of course, but the whole issue 
of code cleanup can be seen as a reasonably independent first step (while 
moving to a fd-based interface should probably not be done without some 
cleanup first, so they _are_ somewhat tied together).

Linus



Re: linux-next: add utrace tree

2010-01-26 Thread Andi Kleen
Tom Tromey tro...@redhat.com writes:

 * Use an fd, not SIGCHLD+wait, to report inferior state changes to gdb.
   Internally we're already using a self-pipe to integrate this into
   gdb's main loop.  Relatedly, don't mess with the inferior's parentage.

How would having a kernel based solution be better over your
user space simulation?

BTW there's the new signalfd() system call that might do it
(haven't checked if it works for SIGCHLD)

 * Support displaced stepping in the kernel; I think this would improve
   performance when debugging in non-stop mode.

Not sure what displaced stepping is exactly, but it 
sounds like the branch tracing extensions that got added a 
few releases ago? On modern Intel chips they give you a branch
buffer in memory.

-Andi
-- 
a...@linux.intel.com -- Speaking for myself only.



Re: linux-next: add utrace tree

2010-01-26 Thread Linus Torvalds


On Tue, 26 Jan 2010, Andi Kleen wrote:

 Tom Tromey tro...@redhat.com writes:
 
  * Use an fd, not SIGCHLD+wait, to report inferior state changes to gdb.
Internally we're already using a self-pipe to integrate this into
gdb's main loop.  Relatedly, don't mess with the inferior's parentage.
 
 How would having a kernel based solution be better over your
 user space simulation?

Oh, the reason we should do something in the kernel is that you really 
can't do certain things with the ptrace() interface. 

For example, think about how Wine and UML use ptrace - and then realize 
that that makes it impossible to attach a debugger from the outside. 
That's a real deficiency in ptrace - much more so than the fact that there 
are some odd details (ie the whole read/write a word at a time is just a 
quirky detail in comparison - not a fundamental problem).

 BTW there's the new signalfd() system call that might do it
 (haven't checked if it works for SIGCHLD)

No, you miss the point.

The problem isn't that you want to turn signals into a file descriptor 
just because you like file descriptors.

The problem is that anything that is based on reparenting and signals is 
fundamentally a one parent only kind of interface. See?

So the reason I think using an fd is a good idea is _not_ because gdb 
already uses an fd internally, but because it gives you a connection 
between the debugger and debuggee that is not fundamentally limited to a 
single controller.

(It doesn't have to be a file descriptor, of course, but could be any kind 
of other model that allows multiple connections. It's just that in unix 
terms, using a file descriptor as the cookie for the connection is a 
very natural model. So the important part isn't the file descriptor 
itself, it's the model you could build).

Linus



Re: linux-next: add utrace tree

2010-01-26 Thread Andi Kleen
 The problem is that anything that is based on reparenting and signals is 
 fundamentally a one parent only kind of interface. See?

I was actually thinking about that before I wrote the email.

But when I did that i couldn't come up with a good scenario
where multiple debuggers actually make sense. In a sense
being a debugger is really a very intimate thing for process. Do you
really want to have multiple of them messing with each other?

If yes how would they know what to touch and what not?

The only thing I could think of was user space virtualization
(like old UML) together with a real debugger, but frankly
these solutions all seemed like big race conditions to me
anyways and should be better done in the kernel or below it, 
so I have a hard time taking them seriously.

Can you think of any scenario where multiple debuggers
on a process make sense?

-Andi



Re: linux-next: add utrace tree

2010-01-26 Thread Oleg Nesterov
On 01/26, Linus Torvalds wrote:

 The problem is that anything that is based on reparenting and signals is
 fundamentally a one parent only kind of interface. See?

Indeed. signals + do_wait() is the horrible model.

 So the reason I think using an fd is a good idea is _not_ because gdb
 already uses an fd internally, but because it gives you a connection
 between the debugger and debuggee that is not fundamentally limited to a
 single controller.

 (It doesn't have to be a file descriptor, of course, but could be any kind
 of other model that allows multiple connections.

Yes.

But then we need something which represents this connection in kernel:
utrace_engine. Then we need something which allows multiple tracers to
cooperate. Just for example, one tracer wants to resume the tracee,
another tracer wants the tracee to be stopped. Utrace does this. And,
since we should preserve the current ptrace, the tracers should cooperate
with ptrace too.

IOW, this quickly leads to the new abstraction layer, I think. And of
course it is possible to implement this new model on top of utrace.

Yes, utrace itself comes with utrace_engine_ops vector to implement
whatever you like, perhaps you dislike this part.

Oleg.



Re: linux-next: add utrace tree

2010-01-26 Thread Oleg Nesterov
On 01/26, Andi Kleen wrote:

 But when I did that i couldn't come up with a good scenario
 where multiple debuggers actually make sense. In a sense
 being a debugger is really a very intimate thing for process. Do you
 really want to have multiple of them messing with each other?

 If yes how would they know what to touch and what not?

Yes, multiple debuggers can confuse each other if they change
the state of debuggee simultaneously. The user should do this ;)

 Can you think of any scenario where multiple debuggers
 on a process make sense?

Simple example. Try to debug/strace strace ot gdb itself. Not trivial,
you can't attach to strace's tracees. Recently I spent 2 days trying to
understand why strace -f hangs. I was able to attach to strace, but
I wasn't able to see what its tracees do.

And, it was not possible to even trace strace until it hangs, with
ptrace the tracee (strace) must stop to report the event and this
shadowed the race.

Oleg.



Re: linux-next: add utrace tree

2010-01-26 Thread Andi Kleen
 Simple example. Try to debug/strace strace ot gdb itself. Not trivial,
 you can't attach to strace's tracees. Recently I spent 2 days trying to
 understand why strace -f hangs. I was able to attach to strace, but
 I wasn't able to see what its tracees do.

But what would the semantics be inside the tracees even if you could?

 And, it was not possible to even trace strace until it hangs, with
 ptrace the tracee (strace) must stop to report the event and this
 shadowed the race.

Shadowing the race was the second surname of strace I thought anyways @)
Basically if you care about races never use strace in the first place.

-Andi
-- 
a...@linux.intel.com -- Speaking for myself only.



Re: linux-next: add utrace tree

2010-01-26 Thread Tom Tromey
 Linus == Linus Torvalds torva...@linux-foundation.org writes:

Tom * Support displaced stepping in the kernel; I think this would improve
Tom performance when debugging in non-stop mode.

Linus Don't we already do that at least on x86?

I don't know.  If it does, and gdb does not yet use that, then that
would be worth changing.

Linus Or maybe I'm not understanding what displaced stepping means to you.

In non-stop mode (where you can stop one thread but leave the others
running), gdb wants to have the breakpoints always inserted.  So,
something must emulate the displaced instruction.

Tom



Re: linux-next: add utrace tree

2010-01-26 Thread Tom Tromey
Tom * Use an fd, not SIGCHLD+wait, to report inferior state changes to gdb.
Tom Internally we're already using a self-pipe to integrate this into
Tom gdb's main loop.  Relatedly, don't mess with the inferior's parentage.

Andi How would having a kernel based solution be better over your
Andi user space simulation?

Signals and wait are a pain because if we want to use some random
library in gdb, there might be conflicts.  This is true even if we use
signalfd.  An fd-for-debugging does not have this problem.  This matters
more now that we're letting people script gdb in python.

Tom



Re: linux-next: add utrace tree

2010-01-26 Thread Oleg Nesterov
On 01/26, Andi Kleen wrote:

  Simple example. Try to debug/strace strace ot gdb itself. Not trivial,
  you can't attach to strace's tracees. Recently I spent 2 days trying to
  understand why strace -f hangs. I was able to attach to strace, but
  I wasn't able to see what its tracees do.

 But what would the semantics be inside the tracees even if you could?

In this particular case, all I need was something like gdb -p to
attach to the tracee, see the backtrace and detach.

  And, it was not possible to even trace strace until it hangs, with
  ptrace the tracee (strace) must stop to report the event and this
  shadowed the race.

 Shadowing the race was the second surname of strace I thought anyways @)
 Basically if you care about races never use strace in the first place.

Yes. And utrace doesn't require the tracee to be stopped to report the
event ;) Yes, yes, utrace can't fix strace in this sense automatically,
but still.

Oleg.



Re: linux-next: add utrace tree

2010-01-26 Thread Linus Torvalds


On Tue, 26 Jan 2010, Tom Tromey wrote:
 
 In non-stop mode (where you can stop one thread but leave the others
 running), gdb wants to have the breakpoints always inserted.  So,
 something must emulate the displaced instruction.

I'm almost totally uninterested in breakpoints that actually re-write 
instructions. It's impossible to do that efficiently and well, especially 
in threaded environments.

So if you do instruction rewriting, I can only say that's your problem.

But using the hardware breakpoints should automatically DTRT, both wrt 
threads _and_ wrt restarting. Sure, there's onyl a limited number of them, 
so if somebody wants more than that they are kind of screwed, but that's 
just how life is.

Linus



Re: linux-next: add utrace tree

2010-01-26 Thread Frank Ch. Eigler

tromey wrote:

 [...]
 In non-stop mode (where you can stop one thread but leave the others
 running), gdb wants to have the breakpoints always inserted.  So,
 something must emulate the displaced instruction.

This sounds like the sort of thing that kernel kprobes do, which the
uprobes patch does for userspace.  The gdbstub prototype can use
uprobes for such displaced breakpoints, and single-step-out-of-line
to execute them on a few platforms like x86-*.  This is already
prototyped / working.  (gdbstub currently restricts itself to
single-threaded programs only, but that's another todo.)

- FChE



Re: linux-next: add utrace tree

2010-01-25 Thread Linus Torvalds


On Sun, 24 Jan 2010, Kyle Moffett wrote:
 
 The point that's being missed is that there is a chicken-and-egg
 problem here.  The chicken is a replacement or extension to the
 debugger interface that would make it possible for me to do things
 like GDB a process while it's being strace'd or vice versa.  The egg
 is the utrace bits, an unstable but somewhat arch-generic ABI that
 abstracts out ptrace() to make it possible to stack both in-kernel and
 userspace debuggers/tracers/etc and have multiple simultaneous users.

Quite frankly, as far as I'm concerned, I'd be a whole lot more interested 
in utrace if it's _only_ stated (and implied) goal was to do exactly this.

The thing I object to is the whole dessert topping _and_ floor wax 
thing, with kernel interfaces for random other users.

If somebody extended ptrace in good ways, that's a totally different 
thing. But I think utrace has been over-designed, possibly as a result of 
others coming in and saying hey, I'd like to use that too for xyz.

Do one thing, and do it well. I'd not mind somebody improving ptrace 
(including extending its semantics - I do agree that the whole SIGSTOP 
thing makes it hard to have multiple debuggers).

That said, I also suspect that people should still look seriously at 
simply just improving ptrace. For example, I suspect that the biggest 
problem with ptrace is really just the signalling, and that creating a new 
extension for JUST THAT, and then having a model where you can choose - at 
PTRACE_ATTACH time - how to wait for events would be a good thing.

But as long as it is I want to solve all problems, I'm not very 
impressed. 

Maybe somebody would be interested in trying to take the utrace 
improvements, and scaling down what they promise, and ignoring all input 
except for I want to strace and gdb at the same time.

So stop the crazy new kernel interfaces crap. Stop the crazy maybe we 
can use it for ftrace and generic user event tracing too. Stop the crazy.

Linus



Re: linux-next: add utrace tree

2010-01-25 Thread Frank Ch. Eigler
Hi -

On Mon, Jan 25, 2010 at 08:52:41AM -0800, Linus Torvalds wrote:

 [...]  If somebody extended ptrace in good ways, that's a totally
 different thing. But I think utrace has been over-designed, possibly
 as a result of others coming in and saying hey, I'd like to use
 that too for xyz. [...]

Earlier, you said that you haven't followed utrace at all.  Upon
what real information do you infer that it has been over-designed?


- FChE



Re: linux-next: add utrace tree

2010-01-25 Thread Linus Torvalds


On Mon, 25 Jan 2010, Frank Ch. Eigler wrote:
 
 Earlier, you said that you haven't followed utrace at all.  Upon
 what real information do you infer that it has been over-designed?

Upon the information that people are talking about magic new kernel 
interfaces to do fancy things. And talking about doing things with it that 
are simply not relevant for ptrace/strace.

In fact, in this very thread I've been informed that there are no user 
interfaces to utrace at all, which to me says that it's been TOTALLY 
MISDESIGNED FROM THE VERY START, and has nothing to do with making ptrace 
work for strace/gdb at the same time.

In other words, I may not have followed utrace development, but I sure as 
hell can read. And everything I read about it just makes me less inclined 
to want to merge it. The people who argue for it are actually screwing 
themselves by arguing for all the wrong things, and making me convinced I 
don't want to touch it with a ten-foot pole.

If somebody were to argue that this is a simple series of patches to 
clean up ptrace and make it possible to strace a debugged process, then 
that would have been different. That's not what you or others have been 
doing. You've been pushing exactly the _reverse_ of that, namely how great 
it is for some random totally new features that I'm convinced aren't even 
used by a lot of people.

So give me a populist argument that makes sense for tons of actual users, 
not some f*cking here's a cool infrastructure that developers can do 
random crazy out-of-tree crap with. Because I'm not interested in crazy 
developers.

Linus



Re: linux-next: add utrace tree

2010-01-25 Thread Linus Torvalds


On Mon, 25 Jan 2010, Linus Torvalds wrote:
 
 So give me a populist argument that makes sense for tons of actual users, 
 not some f*cking here's a cool infrastructure that developers can do 
 random crazy out-of-tree crap with. Because I'm not interested in crazy 
 developers.

In other words, give me the killer feature. The thing I've asked for all 
the time. The thing that you seem to continually NOT EVEN UNDERSTAND.

Linus



Re: linux-next: add utrace tree

2010-01-25 Thread Steven Rostedt
On Mon, 2010-01-25 at 09:36 -0800, Linus Torvalds wrote:

  Because I'm not interested in crazy 
 developers.
 
   Linus


Uh oh, that's not good for us real-time folks.

http://lwn.net/Articles/357800/

And, according to Linus, the realtime people are crazy, so they can be
left to deal with the weird stuff.


-- Steve

(Sorry, I just couldn't resist)




Re: linux-next: add utrace tree

2010-01-25 Thread Alan Cox
 Uh oh, that's not good for us real-time folks.
 
 http://lwn.net/Articles/357800/
 
 And, according to Linus, the realtime people are crazy, so they can be
 left to deal with the weird stuff.

I'd prefer the trees to be separate for testing purposes: it 
doens't make much sense to have SMP support as a normal
kernel feature when most people won't have SMP anyway
-- Linus Torvalds


Use cases got that into the tree pretty easily, I am sure RT ones will do
the same.



Re: linux-next: add utrace tree

2010-01-25 Thread Linus Torvalds


On Mon, 25 Jan 2010, Steven Rostedt wrote:
 
 Uh oh, that's not good for us real-time folks.
 
 http://lwn.net/Articles/357800/
 
 And, according to Linus, the realtime people are crazy, so they can be
 left to deal with the weird stuff.

The RT people have actually been pretty good at slipping their stuff in, 
in small increments, and always with good reasons for why they aren't 
crazy. 

Yeah, it's taken them years, and they still have out-of-tree stuff. And 
yeah, they had to change some things to make them more palatable to the 
mainline kernel - the whole fundamental raw spinlock change is just the 
most recent example of that.

But on the whole, I think it's actually worked out pretty well for them. I 
think the mainline kernel has improved in the process, but I also suspect 
that _their_ RT patches have also improved thanks to having to make the 
work more palatable to people like me who don't care all that deeply about 
their particular flavor of crazy.

And yeah, I still think the hard-RT people are mostly crazy. 

So I can work with crazy people, that's not the problem. They just need to 
_sell_ their crazy stuff to me using non-crazy arguments, and in small and 
well-defined pieces. When I ask for killer features, I want them to lull 
me into a safe and cozy world where the stuff they are pushing is actually 
useful to mainline people _first_.

In other words, every new crazy feature should be hidden in a nice solid 
Trojan Horse gift: something that looks _obviously_ good at first sight. 

The fact that it may contain the germs for future features should be 
hidden so well that not only is it not used as an argument (Hey, look at 
all those soldiers in that horse, imagine what you could do with them), 
it should also not be obvious from the source code (Look at all those 
hooks I sprinkled around, which aren't actually used by anything, but just 
imagine what you could do with them).

Linus



Re: linux-next: add utrace tree

2010-01-25 Thread Steven Rostedt
On Mon, 2010-01-25 at 10:12 -0800, Linus Torvalds wrote:

 But on the whole, I think it's actually worked out pretty well for them. I 
 think the mainline kernel has improved in the process, but I also suspect 
 that _their_ RT patches have also improved thanks to having to make the 
 work more palatable to people like me who don't care all that deeply about 
 their particular flavor of crazy.

Actually this is an understatement. Every feature (and I do mean
_every_) that went from -rt into mainline, undertook 3 or more rewrites
before it was acceptable for mainline. And every time, the end result
made the -rt patch set better as a whole.

Not to mention, that a lot of the early stuff also cleaned up mainline.
You can't have Real-Time without having a clean kernel. And as you
stated, a lot of those patches to clean up the kernel, no one even knew
that the real reason was to help the -rt patch set. They were well
disguised Trojan horses.

Darn, it looks like you are onto our scheme.

-- Steve




Re: linux-next: add utrace tree

2010-01-25 Thread Thomas Gleixner
On Mon, 25 Jan 2010, Steven Rostedt wrote:

 On Mon, 2010-01-25 at 10:12 -0800, Linus Torvalds wrote:
 
  But on the whole, I think it's actually worked out pretty well for them. I 
  think the mainline kernel has improved in the process, but I also suspect 
  that _their_ RT patches have also improved thanks to having to make the 
  work more palatable to people like me who don't care all that deeply about 
  their particular flavor of crazy.
 
 Actually this is an understatement. Every feature (and I do mean
 _every_) that went from -rt into mainline, undertook 3 or more rewrites
 before it was acceptable for mainline. And every time, the end result
 made the -rt patch set better as a whole.
 
 Not to mention, that a lot of the early stuff also cleaned up mainline.
 You can't have Real-Time without having a clean kernel. And as you
 stated, a lot of those patches to clean up the kernel, no one even knew
 that the real reason was to help the -rt patch set. They were well
 disguised Trojan horses.

Tsss. Never admit such things.

 Darn, it looks like you are onto our scheme.

Which scheme ? The only Trojan horses in the kernel tree are in
drivers/char/drivers/char/tty_io.c which put Linus himself into
Linux-0.98.2 :)

tglx



Re: linux-next: add utrace tree

2010-01-25 Thread Ingo Molnar

* Thomas Gleixner t...@linutronix.de wrote:

 On Mon, 25 Jan 2010, Steven Rostedt wrote:
 
  On Mon, 2010-01-25 at 10:12 -0800, Linus Torvalds wrote:
  
   But on the whole, I think it's actually worked out pretty well for them. 
   I think the mainline kernel has improved in the process, but I also 
   suspect that _their_ RT patches have also improved thanks to having to 
   make the work more palatable to people like me who don't care all that 
   deeply about their particular flavor of crazy.
  
  Actually this is an understatement. Every feature (and I do mean _every_) 
  that went from -rt into mainline, undertook 3 or more rewrites before it 
  was acceptable for mainline. And every time, the end result made the -rt 
  patch set better as a whole.
  
  Not to mention, that a lot of the early stuff also cleaned up mainline. 
  You can't have Real-Time without having a clean kernel. And as you stated, 
  a lot of those patches to clean up the kernel, no one even knew that the 
  real reason was to help the -rt patch set. They were well disguised Trojan 
  horses.
 
 Tsss. Never admit such things.

Here's four examples of recent kernel features:

 - lockdep  [1]
 - ftrace   [2]
 - new-style generic mutexes and spin-mutexes   [3]
 - the new arch/x86 tree[4]

I suspect few would guess that all of these features were motivated by the -rt 
kernel originally:

[1] lockdep started out as the 'track irqs-off sections' patches in -rt
[2] ftrace started out as -rt's latency tracer and logdev
[3] mutex.c was motivated by rtmutex.c
[4] arch-x86 was motivated by annoyance with needless porting of -rt 
features from 32-bit to 64-bit x86 and back.

[ Nor would you normally guess that Linux itself was motivated by a guy 
  wanting to toy around with 32-bit x86 assembly ;-) ]

Various forms of craziness that motivate us dont really hurt, as long as the 
process is rooted in reality. We can 'wish' for the crazier future stuff and 
can help it indirectly, and sometimes it might even happen down the road - but 
reality and common-sense utility is what controls.

And note that there's nothing dishonest about doing multi-purpose patches, as 
long as the mainstream purpose isnt really just a decoy. When we decouple a 
feature from -rt we usually forget its -rt purpose and the intermediate 
for-mainstream forms arent even useful for -rt - back-integration into -rt 
comes at a later stage. This makes it doubly sure that it's all formed by 
mainstream's need, not -rt's needs.

In the few cases where the -rt role is prominent for some weird reason we 
declare it as such. It's the exception to the rule really - few useful kernel 
features are single purpose. ( When they are then we are likely doing 
something wrong. -rt _is_ a special case. )

Ingo



Re: linux-next: add utrace tree

2010-01-25 Thread Linus Torvalds


On Mon, 25 Jan 2010, Mark Wielaard wrote:
 
 And all these users have wishes to extend the current ptrace interface
 mess. But nobody dares to extend ptrace in any direction because
 fixing/cleaning up one of these use cases might break the others in
 subtle and not so subtle ways. Which is why the utrace series of patches
 is cleaning up all this stuff first.

I call bullshit.

You can clean up ptrace without introducing odd new interfaces and trying 
to sell it as some revolutionary new kernel interface that can do 
anything.

I also call bullshit on the ptrace() is so horribly nasty argument. Yes, 
I've seen the code that uses ptrace in user space, and yes, it's nasty, 
but it's invariably _not_ nasty so much because ptrace itself is nasty, 
but because it's full of #ifdef so-and-so-os/so-and-so-arch, and the code 
is never cleaned up.

There are a couple of obvious cases of ptrace being uglier-than-it-needs- 
to-be. Like the traditional ptrace read/write interface being purely word 
at a time, and that clearly is not pretty. Several architectures already 
do copy range kind of versions on it, though, so that's just a detail, 
and if anybody wanted to clean it up, they could have.

The more fundamental problem is the use of signals (while at the same time 
wanting to _trap_ non-ptrace signals), without any model for a connection 
state, which is why you can have only one tracer. But again, that's 
largely a user interface issue, and apparently utrace does _nothing_ for 
that problem at all.

So I do agree that ptrace is not a great interface. However: repeating 
that statement over and over in _no_ way excuses some totally unrelated 
code that doesn't have anything what-so-ever to do with the actual 
problems of ptrace.

Linus



Re: linux-next: add utrace tree

2010-01-25 Thread Tom Tromey
 Linus == Linus Torvalds torva...@linux-foundation.org writes:

Linus No. There is absolutely _no_ reason to believe that gdb et al would ever 
Linus delete the ptrace interfaces anyway. 

Yes, in GDB we approximately never delete anything.

Nevertheless, if the Linux kernel were to present a new user-space API,
and if it had an advantage over ptrace, then we would port GDB to use
it.  There are other platforms where, IIRC, we now use some /proc thing
instead of ptrace.

There are definitely things we would like from such an API.  Here's a
few I can think of immediately, there are probably others.

* Use an fd, not SIGCHLD+wait, to report inferior state changes to gdb.
  Internally we're already using a self-pipe to integrate this into
  gdb's main loop.  Relatedly, don't mess with the inferior's parentage.

* Support displaced stepping in the kernel; I think this would improve
  performance when debugging in non-stop mode.

* Support some kind of breakpoint expression in the kernel; this would
  improve performance of conditional breakpoints.  Perhaps the existing
  gdb agent expressions could be used.

Tom



Re: linux-next: add utrace tree

2010-01-25 Thread Linus Torvalds


On Mon, 25 Jan 2010, Tom Tromey wrote:
 
 There are definitely things we would like from such an API.  Here's a
 few I can think of immediately, there are probably others.
 
 * Use an fd, not SIGCHLD+wait, to report inferior state changes to gdb.
   Internally we're already using a self-pipe to integrate this into
   gdb's main loop.  Relatedly, don't mess with the inferior's parentage.

As I kind of alluded to elsewhere, I heartily agree with this. The really 
major design mistake of ptrace (as opposed to just various ugly corners) 
is how it has no connection information, and that ends up being one of the 
main reasons why you can't have two ptracers working on the same thing.

(There are other things that complicate that too, of course, like simply 
just trying to manage various per-thread state like debug registers etc, 
but that's a separate class of complications).

 * Support displaced stepping in the kernel; I think this would improve
   performance when debugging in non-stop mode.

Don't we already do that at least on x86? Just doing a single-step should 
work on an instruction even if it has a breakpoint on it, because we set 
the TF bit.

Or maybe I'm not understanding what displaced stepping means to you.

 * Support some kind of breakpoint expression in the kernel; this would
   improve performance of conditional breakpoints.  Perhaps the existing
   gdb agent expressions could be used.

I suspect it might be reasonable to do simple expressions on breakpoints, 
but not the kind of things gdb exports to users. IOW, maybe you could have 
a single conditional on a single value (register or memory) associated 
with an expression.

Regardless, internally to the kernel your two later issues are details. 
The how to connect to the debuggee is a much more fundamental issue, and 
has the biggest design/interface impact. The other would likely just be 
new ptrace command extensions that somebody would have to just implement 
the grotty details on.

Linus



Re: linux-next: add utrace tree

2010-01-25 Thread Renzo Davoli
Let me add my two euro-cents to this discussion.

Mark Wielaard m...@redhat.com:
 Unfortunately ptrace does all that magic already (badly). People don't
 just use it for (s)tracing syscalls, but also for tracing signals, for
  single step debugging and poking at memory, register state, for process
 jailing and virtualization (uml) through syscall emulation.
 So when they are talking about these fancy things that is because that
 is what ptrace gives them currently. And they hate it, because the
 ptrace interface is such a pain to work with. And all these things don't
 really work together. You cannot trace, emulate, debug, jail at the same
 time.
I support Mark's words. I don't use ptrace for debugging/tracing and I
have experienced severe limitations of ptrace interface.
(I have tried to post some extensions for ptrace to overcome some 
constraints see my posts on ptrace_vm or ptrace_multi on LKML).

Oleg Nesterov, writing to Andrew Morton said:
 First of all, utrace makes other things possible.  gdbstub,
 nondestructive core dump, uprobes, kmview, hopefully more.  I didn't
 look at these projects closely, perhaps other people can tell more.  As
 for their merge status, until utrace itself is merged it is very hard to
 develop them out of tree.

In the list above there is also kmview, which is a creature of mines.
umview and kmview are partial virtual machines, processes running
in a [uk]mview machine can have their own view for the file system, 
networking support, user-id, system-name, etc.
A [uk]mview machine virtualizes just what the user need: the filesystem
or just a subtree/some subtrees or networking or define one/some
virtual devices, etc. The view provided by a [uk]mview machine can be
a composition of real resources (provided by the Linux kernel) and
virtual resources.

Each system call request gets hijacked to a module of [uk]mview when
it refers to a virtual resource. The request is forwarded to the kernel
otherwise.

umview is based on ptrace, kmview uses a kernel module based on utrace.
(umview is included in debian lenny (to sid), tutorial and manuals in 
wiki.virtualsquare.org)

IMHO utrace is better than ptrace (or an optimized version of it):
1 - Frank Ch. Eigler wrote: 
 At least one reason is that ptrace is single-usage-only, so for
 example you cannot concurrently debug  strace the same program.
  - exactly. utrace allows multiple tracing engines, this means that kmview 
  machines can be nested (in a natural way, no extra code is needed for
  this feature). In the same way strace/gdb can run on virtualized processes, 
too.
2 - kmview kernel module implements several optimizations
  to minimize the number of requests forwarded to the kmview process
  (the virtual machine monitor). kmview is just a module using the
  utrace interface, prior attempts of optimized umview required kernel patches.
  Like kmview any other service requiring process tracing can include 
  specific optimizations in its own kernel module.
  On the other hand, all these services could use the standardized utrace
  interface for their optimizations, instead asking for messy patches 
  to change code all around the kernel source.
3 - ptrace takes SIGSTOP/SIGCONT for its own management. Strace/gdb and
  umview cannot be transparent for programs using these signals.

Oleg Nesterov talking about Ptrace said:
 Of course they can't use other interfaces, we don't have them. And
 without the new abstraction layer we will never have, I think.
I agree.

THe following list includes the execution times I got in a recent test 
(make vde-2, see http://www.cs.unibo.it/~renzo/view-os-lk2009.pdf)
plain kernel 22.7s, 
kmview (no modules) 23.9s (+5.5%), 
full kmview (modules loaded, all syscall virtualized) 38.5s (+70%)
optimized umview 51.0 (+124%), 
umview on vanilla kernel 75.7s (+233%).

utrace can be used to speedup virtualization (at least in my case
it worked in this way). 
Performance can be useful for debugging but it is a main issue for
virtualization.
Kmview module provides optimizations to select the system call requests 
depending on the syscall number, the pathnames or the file descriptors. 
http://wiki.virtualsquare.org/index.php/KMview_module_interface_specifications
Trying to add all the optimizations needed by different projects to ptrace is a
never-ending nightmare: the LKML will continue to receive patch proposals
for ptrace... 
The solution is that everybody can code his/her optimized kernel/user 
interface for tracing in his/her kernel module, i.e. utrace.

renzo



Re: linux-next: add utrace tree

2010-01-25 Thread Linus Torvalds


On Tue, 26 Jan 2010, Renzo Davoli wrote:
 
 The solution is that everybody can code his/her optimized kernel/user 
 interface for tracing in his/her kernel module, i.e. utrace.

I don't think people understand. That is simply not a solution. That is 
a PROBLEM. The thing you describe is an absolute disaster. Which is 
exactly why I rant against it.

The last thing we want to have is here, take this, and make your own 
kernel module mess around it optimized for your particular crazy 
scenario.

But every SINGLE post in this thread that has argued for utrace has argued 
exactly this way. 

Linus



Re: linux-next: add utrace tree

2010-01-24 Thread tytso
On Sat, Jan 23, 2010 at 09:04:56PM -0800, Linus Torvalds wrote:
  The killer app for this will be the ability to delete thousands of
  lines of code from GDB, strace, and all the various other tools that
  have to painfully work around the major interface gotchas of ptrace(),
  while at the same time making their handling of complex processes much
  more robust.
 
 No. There is absolutely _no_ reason to believe that gdb et al would ever 
 delete the ptrace interfaces anyway. 

More to the point, gdb *couldn't* use utrace, because utrace only
exports a kernel API; not a syscall interface.  And if the Red Hat
Toolchain folks are thinking about encouraging gdb to start creating
out-of-tree kernel modules, so that (a) gdb requires root privs, and
(b) gdb is as (un)stable as SystemTap with respect to development
kernels by making it dependent on internal kernel API's, the Red Hat
Toolchain group needs to be smacked upside the head...

   - Ted



Re: linux-next: add utrace tree

2010-01-24 Thread Frank Ch. Eigler
Hi -

On Sun, Jan 24, 2010 at 05:25:13AM -0500, ty...@mit.edu wrote:
 [...]
   The killer app for this will be the ability to delete thousands of
   lines of code from GDB, strace, and all the various other tools that
   have to painfully work around the major interface gotchas of ptrace(),
   while at the same time making their handling of complex processes much
   more robust.
  
  No. There is absolutely _no_ reason to believe that gdb et al would ever 
  delete the ptrace interfaces anyway. 
 
 More to the point, gdb *couldn't* use utrace, because utrace only
 exports a kernel API; not a syscall interface.

Yes, this might explain why Kyle wrote:

   [...] I believe that utrace is the kernel side of that
   API. [...]

 And if the Red Hat Toolchain folks are thinking about encouraging
 gdb to start creating out-of-tree kernel modules [...]  the Red Hat
 Toolchain group needs to be smacked upside the head...

Those keeping up will note that an ordinary in-tree, non-modular,
non-root-only, already-works-with-standard-gdb,
potentially-better-than-ptrace debugger interface has already been
prototyped  posted on lkml as an RFC.


- FChE



Re: linux-next: add utrace tree

2010-01-24 Thread Thomas Gleixner
On Sat, 23 Jan 2010, Frank Ch. Eigler wrote:
 On Sat, Jan 23, 2010 at 07:04:01AM +0100, Ingo Molnar wrote:
 
  [...]  Also, if any systemtap person is interested in helping us
  create a more generic filter engine out of the current ftrace filter
  engine (which is really a precursor of a safe, sandboxed in-kernel
  script engine), that would be excellent as well. [...]
 
 Thank you for the invitation.
 
  More could be done - a simple C-like set of function perhaps - some minimal 
  per probe local variable state, etc. (perhaps even looping as well, with a 
  limit on number of predicament executions per filter invocation.)
 
 Yes, at some point when such bytecode intepreter gets rich enough, one
 may not need the translated-to-C means of running scripts.
 
 
  ( _Such_ a facility, could then perhaps be used to allow applications 
  access 
to safe syscall sandboxing techniques: i.e. a programmable seccomp 
  concept 
in essence, controlled via ASCII space filter expressions [...]
IMHO that would be a superior concept for security modules too [...]
 
  [...]  specific functionality with an immediately visible upside,
  with no need for opaque hooks.
 
 This OTOH seem like rather a stretch.  If one claims that opaque
 hooks are bad, so instead have hooks that jump not to auditable C
 code but an bytecode interpreter?  And have the bytecodes be uploaded
 from userspace?  How is this supposed to produce transparency from
 the kernel/hook point of view?

Simply because the kernel controls which byte code is executed and has
control over the functionality behind it. That makes the hooks well
defined and transparent.

Thanks,

tglx



Re: linux-next: add utrace tree

2010-01-24 Thread Frank Ch. Eigler
Hi -

tytso wrote:
 [...]

Let me see if I can paraphrase those of your concerns that were substantive:

1) That if utrace is merged, and systemtap keeps on using it, there may be
   some sort of chilling effect on kernel developers that would impede
   utrace's future development.

This might sound plausible to an outsider, but luckily we're not stuck
with having to speculate: one can examine history.  Systemtap has been
around, working roughly the same way, for about *five years*.

Systemtap modules use more than a handful of mainstream
module-accessible kernel services.  During all this time, how many
examples have there been when when systemtap developers have pleaded
with lkml to avoid changing some prior interface?  How many of those
successfully?  (That last one is a trick question, since both numbers
are really close to *zero*.)  How much real impediment to change has
our mere existence caused?


2) That systemtap is not portable to all kernel versions.

Problems do periodically occur.  However, one can again refer to
historical facts to assess whether in fact they warrant long term
grudges.  In every release note, we list the range of kernel versions
we test against.  We may have one of the broadest ranges of support,
2.6.9 through to many current -rc*s and non-linus trees.  We have
several mechanisms which let us easily adapt to most changes.  It may
interest readers to find out that the number of systemtap changes we
have had to add on account of kernel changes is on the order of a *few
per year*.  The usual turnaround, once reported, is on the order of a
*few days*.


3) That systemtap users will complain to kernel developers if
   systemtap becomes incompatible.

Let's go to the historical record again.  How many such complaints
have actually been seen in inappropriate fora such as lkml?  How
difficult were they to diagnose / redirect to the proper venue?  Have
they constituted a loss of face for kernel developers?


4) That systemtap is almost but not quite as evil as nvidia.

It seems factors like ...

- always being completely open source project
- keeping in regular contact with lkml and other constituencies
- not being related to essential hardware enablement, so users
  not wanting it don't have to touch it
- the compile-to-C approach being technologically necessary since
  there was no alternative plausible way at the time (and still now)
- repeatedly offering infrastructure code with non-stap uses

... all add up to a mere nudge away from entirely evil.  If so, I
wonder if your sort of grossly bimodal view of ethical virtue is going
to foster the right sorts of change in the linux kernel community.


- FChE



Re: linux-next: add utrace tree

2010-01-24 Thread Chris Moller

On 01/24/10 13:01, Frank Ch. Eigler wrote:


... all add up to a mere nudge away from entirely evil.  If so, I
wonder if your sort of grossly bimodal view of ethical virtue is going
to foster the right sorts of change in the linux kernel community.
   


Nothing like a good religious debate to liven up your Sunday...



- FChE

   




Re: linux-next: add utrace tree

2010-01-24 Thread Kyle Moffett
On Sat, Jan 23, 2010 at 14:48,  ty...@mit.edu wrote:
 The fundamental issue which Ingo is trying to say (and which you
 apparently don't seem to be understanding) is that utrace doesn't
 export a syscall (which is an ABI that we are willing to promise will
 be stable), but rather a set of kernel API's (which we never promise
 to be stable),

The point that's being missed is that there is a chicken-and-egg
problem here.  The chicken is a replacement or extension to the
debugger interface that would make it possible for me to do things
like GDB a process while it's being strace'd or vice versa.  The egg
is the utrace bits, an unstable but somewhat arch-generic ABI that
abstracts out ptrace() to make it possible to stack both in-kernel and
userspace debuggers/tracers/etc and have multiple simultaneous users.

 and the fact that there will be out-of-tree programs
 that are going to be trying to depend on that interface (much like
 Systemtap does today when it creates kernel modules) is something that
 is considered on par with Nvidia trying to ship proprietary video
 drivers.

Ugh... perhaps we should derive a variation of Godwin's law for this:

As an LKML discussion grows longer, the probability of an unfavorable
comparison involving nVidia or Microsoft approaches 1.


 If you want to try to slide utrace in, such that we're able to ignore
 the fact that there will be this external house that will be built on
 quicksand, pointing at how nice the external house will be isn't going
 to be helpful.  Nor is pointing at the ability that other people will
 be able to build other really nice houses on the aforementioned
 quicksand (i.e., out-of-tree kernel modules that depend on kernel
 API's).

Personally I don't give a flying  about SystemTap; I'm interested
in things like the ability to stack gdb with strace, the RFC gdb-stub
posted a week ago, etc.  None of those abilities would be out-of-tree
modules at all, and therefore the quicksand analogy is specious.


 A simple code cleanup argument is not carrying the day (Look!  We
 can cleanup the ptree code!).  It's going to have to be a **really**
 cool in-tree kernel funtionality that provides a killer feature (in
 Linus's words), enough so that people are willing to overlook the fact
 that there's this monster external out-of-tree project that wants to
 be depend on API's that may not be stable, and which, even if the
 developers don't grump at us, users will grump at us when we change
 API's that we had never guaranteed will be stable, and then Systemtap
 breaks.

I would be willing to guess that something like 95% of the people
using SystemTap or other tools are doing so on Red Hat Enterprise
Linux or other enterprise supported platforms, and so when something
breaks they go whinge at Red Hat, etc.  If I recall correctly Red Hat
and many of the other vendors already heavily fiddle with kernel
patches they apply to provide some amount of binary module
compatibility.


 This is probably why Ingo invited you to think about ways of doing
 some kind of safe in-kernel bytecode approach.  That has the advantage
 of doing away with external kernel modules, with all of their many
 downsides: its dependency on unstable kernel API's, the fact that many
 financial customers have security policies that prohibit C compilers
 on production machines, the inherent security risk of allowing
 external random kernel modules to be delivered and loaded into a
 system, etc.

There are substantial non-SystemTap uses for utrace that would *not*
be satisfied by an in-kernel bytecode approach, starting with
stacking debuggers and tracers.  Furthermore, let's say they did go
off and build the in-kernel bytecode interpreter.  I can pretty much
guarantee that people would say the hooks into the rest of the kernel
are too invasive and they should be abstracted out into an API.  *This
is that API!*

Cheers,
Kyle Moffett



Re: linux-next: add utrace tree

2010-01-24 Thread Ananth N Mavinakayanahalli
On Sat, Jan 23, 2010 at 12:23:33PM +0100, Ingo Molnar wrote:
 
 * Kyle Moffett k...@moffetthome.net wrote:
 
  On Fri, Jan 22, 2010 at 19:22, Linus Torvalds
  torva...@linux-foundation.org wrote:
 
...

 In that sense it might be better to fix/enhance ptrace, if there's interest. 
 I've written a handful of ptrace extensions in the past (none of them went 
 upstream tho), it can be done in a useful manner and the code is pretty 
 hackable. There are basic problems left to be solved: for example why is 
 there 
 still no 'memory block copy' call, why are we _still_ limited to one word per 
 system call PTRACE_PEEK* memory copies? It's ridiculous. SparcLinux has 
 PTRACE_WRITE*/READ* support that implements this, but none of the other 
 architectures have it so it's essentially unused.
 
 Or another possible direction would be to extend the perf events syscall with 
 interception capabilities. It's far more performant at extracting application 
 state without scheduling than any ptrace method - and interception/injection 
 would be a natural next step - if there's interest.

This certainly is now a chicken and egg problem. Everybody agrees that
Linux needs something better than ptrace; legacy ptrace will continue to
live, so will utilities written to it (strace, etc).

But should that limit what Linux can offer? What's the way out?

- Enhance ptrace: At least one ptrace maintainer (Roland) had publically
  stated he doesn't prefer enhancing legacy ptrace -- that its already a
  beast to maintain, and adding more complexity to it does it no good.

- Extend perf; would perf then use utrace underneath? Or would one have
  to redo some of what utrace already does for thread level control?

- Give utrace a syscall and make it the primary way for users to
  interact with the layer. There are benefits to this if there is
  agreement on the utrace layer itself, maybe with less fexibility than
  what it currently offers? If yes, what should it look like?

Any new debug facility will have to incorporate some or most learnings
from what utrace tried to address. It would be sad to just dump utrace
and redo everything from scratch or band-aid existing interfaces.

Ananth



Re: linux-next: add utrace tree

2010-01-24 Thread tytso
On Sun, Jan 24, 2010 at 08:42:13PM -0500, Kyle Moffett wrote:
 
 Personally I don't give a flying  about SystemTap; I'm interested
 in things like the ability to stack gdb with strace, the RFC gdb-stub
 posted a week ago, etc.  None of those abilities would be out-of-tree
 modules at all, and therefore the quicksand analogy is specious.

Great.  So what should be reviewed is utrace *plus* these other
userland interfaces, which may get critiqued and improved, and utrace
patches can be reviewed in light of these new features.  But be
warned if it turns out that only 30% of utrace is only needed to
support gdb stacking with strace, etc., the other 70% will likely get
ejected and the utrace patches streamlined to support these in-tree
users.  But since you don't give a flying  about SystemTap,
presumably you won't mind, right?


 I would be willing to guess that something like 95% of the people
 using SystemTap or other tools are doing so on Red Hat Enterprise
 Linux or other enterprise supported platforms, and so when something
 breaks they go whinge at Red Hat, etc.  If I recall correctly Red Hat
 and many of the other vendors already heavily fiddle with kernel
 patches they apply to provide some amount of binary module
 compatibility.

Sure, but as out-of-tree modules, the best they can expect is that
most kernel developers will pretend that they don't exist.  Which is
OK, when I tried using SystemTap most of the concerns which I
expressed as being critical for kernel developers were largely ignored
(as near as I could tell) because the target market was RHEL corporate
customers, and they prioritized their resourcing accordingly --- so
they shouldn't mind if kernel developers return the favor.

But that means that we should only merge those portions of utrace that
are needed for these alleged killer new features, and only if these
new features are cool enough that they justify the new code on their
own merits.   At least, IMNSHO.

- Ted



Re: linux-next: add utrace tree

2010-01-23 Thread Alexey Dobriyan
On Sat, Jan 23, 2010 at 2:22 AM, Linus Torvalds
torva...@linux-foundation.org wrote:
 This is why when somebody brought up you could do a seccomp-like thing on
 top of utrace that my reaction was and is just totally negative. It shows
 all the wrong kinds of tying things together.

seccomp-via-utrace should be just removed to be honest before its users.
It entered the tree because it was very small and simple.
If rewritten, it no longer is small and simple because of whole kernel/utrace.c.



Re: linux-next: add utrace tree

2010-01-23 Thread Alan Cox
 The killer app for this will be the ability to delete thousands of
 lines of code from GDB, strace, and all the various other tools that
 have to painfully work around the major interface gotchas of ptrace(),
 while at the same time making their handling of complex processes much
 more robust.

Years ago (and it really must be years ago because this was about the
time I started hacking on Linux stuff !) there was a proposal to extract
and sanitize the arch specific stuff in binutils and in gdb etc into
sensible libraries that could be used by other apps.

What I don't understand is why that doesn't solve 99% of your problem.
ptrace is not perfect but most of the real ptrace limitations actually
come about because either the CPU can't do something or because the
supporting logic would be too expensive - things like having extra
private debugger pages.

Yes ptrace needs a lot of icky support code, but it's already been
written...

Alan



Re: linux-next: add utrace tree

2010-01-23 Thread Ingo Molnar

* Kyle Moffett k...@moffetthome.net wrote:

 On Fri, Jan 22, 2010 at 19:22, Linus Torvalds
 torva...@linux-foundation.org wrote:
  There are cases where we really _want_ to have common code. We want to
  have a common VFS interface because we want to show _one_ interface to
  user space across a gazillion different filesystems. We want to have a
  common driver layer (as far as possible) because - again - we expose a
  metric shitload of drivers, and we want to have one unified interface to
  them.
 
 So... Everybody agrees that ptrace() is horrible and a royal pain to use, 
 let alone use correctly and without bugs.  Everybody also agrees that 
 ptrace() needs to stay around for a long time to avoid breaking all the 
 existing users.
 
 Now how do we get from here to a moderately portable API for interrogating, 
 controlling, and intercepting process state? Essentially it would need to 
 support all of the things that a powerful debugger would want to do, 
 including modifying registers and memory, substituting syscall return 
 values, etc.  I believe that utrace is the kernel side of that API.

The problem is, utrace does not do that really.

What utrace does is that it provides an opaque set of APIs for unspecified and 
out of tree _kernel_ modules (such as systemtap). It doesnt support any 
'application' per se. It basically removes the kernel's freedom at shaping its 
own interaction with debug application.

If utrace was a 'better ptrace' syscall, where the syscall itself is the goal 
of the hookery, it would all be rather different. People could argue about 
_that_ interface (and the hooks would be a pure kernel internal 
implementational detail - not an interface specification), and once people 
agree about that ABI and there's enough application momentum behind it, the 
hooks are really not that opaque anymore - they are for that ABI and not more.

Note that it's still a _big_ hurdle: it's hard to agree on a new syscall and 
it's hard to get 'application momentum' behind it. Special Linux system calls 
have a checkered past, they tend to not be used by much anything, and thus 
they tend to be a breeding ground of both bugs, maintenance complexity and 
security problems. Lack of attention is never good.

In that sense it might be better to fix/enhance ptrace, if there's interest. 
I've written a handful of ptrace extensions in the past (none of them went 
upstream tho), it can be done in a useful manner and the code is pretty 
hackable. There are basic problems left to be solved: for example why is there 
still no 'memory block copy' call, why are we _still_ limited to one word per 
system call PTRACE_PEEK* memory copies? It's ridiculous. SparcLinux has 
PTRACE_WRITE*/READ* support that implements this, but none of the other 
architectures have it so it's essentially unused.

Or another possible direction would be to extend the perf events syscall with 
interception capabilities. It's far more performant at extracting application 
state without scheduling than any ptrace method - and interception/injection 
would be a natural next step - if there's interest.

Thanks,

Ingo



Re: linux-next: add utrace tree

2010-01-23 Thread Frank Ch. Eigler
Hi -

mingo wrote:
 [...]
  Now how do we get from here to a moderately portable API for interrogating, 
  controlling, and intercepting process state? Essentially it would need to 
  support all of the things that a powerful debugger would want to do, 
  including modifying registers and memory, substituting syscall return 
  values, etc.  I believe that utrace is the kernel side of that API.
 
 The problem is, utrace does not do that really.

In fact, it is exactly designed for that.

 What utrace does is that it provides an opaque set of APIs for
 unspecified and out of tree _kernel_ modules (such as systemtap). It
 doesnt support any 'application' per se. It basically removes the
 kernel's freedom at shaping its own interaction with debug
 application.

This claim is hard to take any more seriously than emoting that the
blockio layer is opaque because device drivers remove freedom for
the kernel to shape its interaction with hardware.  If you have any
*real evidence* about how any present user of utrace misuses that
capability, or interferes with the kernel's freedom, show us please.


- FChE



Re: linux-next: add utrace tree

2010-01-23 Thread Frank Ch. Eigler
Hi -

On Sat, Jan 23, 2010 at 11:01:21AM +, Alan Cox wrote:
 [...]
 What I don't understand is why [libgdb?] doesn't solve 99% of your problem.
 ptrace is not perfect but most of the real ptrace limitations actually
 come about because either the CPU can't do something or because the
 supporting logic would be too expensive - things like having extra
 private debugger pages.

At least one reason is that ptrace is single-usage-only, so for
example you cannot concurrently debug  strace the same program.
OTOH, utrace is designed to permit clean nesting/sharing semantics for
concurrent debugger-type tools operating on the same processes.

- FChE



Re: linux-next: add utrace tree

2010-01-23 Thread Arnaldo Carvalho de Melo
Em Sat, Jan 23, 2010 at 11:01:21AM +, Alan Cox escreveu:
 Years ago (and it really must be years ago because this was about the
 time I started hacking on Linux stuff !) there was a proposal to extract
 and sanitize the arch specific stuff in binutils and in gdb etc into
 sensible libraries that could be used by other apps.

Aleluiah if it had happened at that time, but sadly... :-(
 
- Arnaldo



Re: linux-next: add utrace tree

2010-01-23 Thread tytso
On Sat, Jan 23, 2010 at 06:47:29AM -0500, Frank Ch. Eigler wrote:
  What utrace does is that it provides an opaque set of APIs for
  unspecified and out of tree _kernel_ modules (such as systemtap). It
  doesnt support any 'application' per se. It basically removes the
  kernel's freedom at shaping its own interaction with debug
  application.
 
 This claim is hard to take any more seriously than emoting that the
 blockio layer is opaque because device drivers remove freedom for
 the kernel to shape its interaction with hardware.  If you have any
 *real evidence* about how any present user of utrace misuses that
 capability, or interferes with the kernel's freedom, show us please.

The fundamental issue which Ingo is trying to say (and which you
apparently don't seem to be understanding) is that utrace doesn't
export a syscall (which is an ABI that we are willing to promise will
be stable), but rather a set of kernel API's (which we never promise
to be stable), and the fact that there will be out-of-tree programs
that are going to be trying to depend on that interface (much like
Systemtap does today when it creates kernel modules) is something that
is considered on par with Nvidia trying to ship proprietary video
drivers.  

(OK, maybe not *quite* as evil as Nvidia because at least SystemTap is
open source, but the bottom line is that enabling out-of-tree modules
isn't considered a good thing, and if we know in advance that there
are out-of-tree modules, there is a strong tendency to want to nip
those in the bud.)

The reason why I avoid Nvidia hardware like the plague is because I
work on bleeding-edge kernels, and even though companies like Nvidia
and Broadcom try very hard to keep up with released upstream kernels,
#1, there is always the concern of what happens if they decide to
change that policy, and #2, invariably something will break during the
-rc1 or -rc2 stage, and then my laptop is useless for running bleeding
edge kernels.  It's one of the reasons why many kernel developers gave
up on SystemTap, because it's not something that can be trusted to be
there, and the fault is not on our changing the API's, it's on
SystemTap depending on API's that were never guaranteed to be stable
in the first place.

If you want to try to slide utrace in, such that we're able to ignore
the fact that there will be this external house that will be built on
quicksand, pointing at how nice the external house will be isn't going
to be helpful.  Nor is pointing at the ability that other people will
be able to build other really nice houses on the aforementioned
quicksand (i.e., out-of-tree kernel modules that depend on kernel
API's).

A simple code cleanup argument is not carrying the day (Look!  We
can cleanup the ptree code!).  It's going to have to be a **really**
cool in-tree kernel funtionality that provides a killer feature (in
Linus's words), enough so that people are willing to overlook the fact
that there's this monster external out-of-tree project that wants to
be depend on API's that may not be stable, and which, even if the
developers don't grump at us, users will grump at us when we change
API's that we had never guaranteed will be stable, and then Systemtap
breaks.

This is probably why Ingo invited you to think about ways of doing
some kind of safe in-kernel bytecode approach.  That has the advantage
of doing away with external kernel modules, with all of their many
downsides: its dependency on unstable kernel API's, the fact that many
financial customers have security policies that prohibit C compilers
on production machines, the inherent security risk of allowing
external random kernel modules to be delivered and loaded into a
system, etc.

  - Ted



Re: linux-next: add utrace tree

2010-01-23 Thread Linus Torvalds


On Sat, 23 Jan 2010, Kyle Moffett wrote:
 
 Now how do we get from here to a moderately portable API for
 interrogating, controlling, and intercepting process state?

Umm? ptrace?

It's not _pretty_, but it's a hell of a lot more portable than utrace is 
ever going to be. Yes, the details differ between OS's (and between 
architectures), but let's face it, things like register state probing is 
_never_ going to be portable across different architectures simply because 
the register state isn't the same.

 The killer app for this will be the ability to delete thousands of
 lines of code from GDB, strace, and all the various other tools that
 have to painfully work around the major interface gotchas of ptrace(),
 while at the same time making their handling of complex processes much
 more robust.

No. There is absolutely _no_ reason to believe that gdb et al would ever 
delete the ptrace interfaces anyway. 

That really is my point. Adding a new interface, when an old and crufty 
(but working) interface is inevitably going to be around anyway - and is 
inevitably always going to have portability issues - is STUPID.

Let's take strace, for example.

Yes, ptrace() is crufty, but have you actually looked at strace source 
code? The problem isn't really a crufty interface to read registers etc, 
the bigger problem for strace is that different architectures and OS's 
have different system call argument rules, different ways to read/write 
system call numbers yadda yadda yadda.

Take a look at strace sources some day. Moving away from ptrace on Linux 
(even if you decided that you don't care about old versions of the kernel 
that don't know anything else) would simplify ABSOLUTELY NOTHING.

Really. Quiet the reverse, I suspect. The Solaris and FreeBSD support uses 
ptrace too, afaik, so you' just be confusing the issue.

And the fact is, strace would still end up supporting ptrace anyway, just 
so that you could run it on old kernels.

So the whole making a new utrace interface would simpligy things is 
simply a total lie. The fact that ptrace is a bit of an odd interface IN 
NO WAY means that any other interface would end up being appreciably 
simpler.

It would just result in _more_ code in strace, and more confusion.

Linus



Re: linux-next: add utrace tree

2010-01-22 Thread Oleg Nesterov
On 01/21, Linus Torvalds wrote:

 On Thu, 21 Jan 2010, Andrew Morton wrote:
 
  ptrace is a nasty, complex part of the kernel which has a long history
  of problems, but it's all been pretty quiet in there for the the past few
  years.

 More importantly, we're not ever going to get rid of it.

Unfortunately, you are right. The current ptrace (as it is visible from
user-space) should stay forever.

 Quite frankly, judging my all past history we have ever seen in kernel
 interfaces, new an non-portable interfaces simply are never used. The
 whole question whether they are nicer or not is entirely immaterial.

I have to admit this point looks very reasonable to me. Except, can't
resist, ptrace itself is hardly portable.

 I'm personally very dubious that there are any merits to utrace that
 outweigh the very clear disadvantages: just another layer that adds a new
 level of abstraction to the only interface that people actually _use_,
 namely ptrace.

Of course they can't use other interfaces, we don't have them. And
without the new abstraction layer we will never have, I think.

Oleg.



Re: linux-next: add utrace tree

2010-01-22 Thread Frank Ch. Eigler
Hi -

oleg wrote:

 [...]
 I'm personally very dubious that there are any merits to utrace that
 outweigh the very clear disadvantages: just another layer that adds a new
 level of abstraction to the only interface that people actually _use_,
 namely ptrace.

 Of course they can't use other interfaces, we don't have them. And
 without the new abstraction layer we will never have, I think.

This is one of the reasons we built, up on request of lkml people, the
utrace-gdbstub prototype (http://lkml.org/lkml/2009/11/30/173).  It
presents a standard userspace debugging interface -- actually, more
standard than ptrace!  It has the potential to be more powerful
feature-wise and perhaps even perform faster than ptrace.  And yet
that RFC didn't receive any on-topic review, only wishes for
unspecified blue-sky integration with kernel debugging.

So then there's uprobes, which is another potential utrace killer
app, if it weren't so tainted by some peoples' disdain for its
current user, when other users are already being seriously discussed.
So a working prototype, which demonstrates both the utility of utrace
itself and the end-user value of user-space probing, is disregarded.

And there are several smaller utrace clients in the works, each of
them merge candidates in the future.  Yes, most of them may be
rewritten with special-purpose hook after hook as people reinvent the
utrace wheel piece by piece, but how long will that take?  How is the
opportunity cost of missing features valued?

Finally, I don't know how to address the logic of if a feature
requires utrace, that's a bad argument for utrace and at the same
time you need to show a killer app for utrace.  What could possibly
satisfy both of those constraints?  Please advise.


- FChE



Re: linux-next: add utrace tree

2010-01-22 Thread Peter Zijlstra
On Fri, 2010-01-22 at 15:01 -0500, Frank Ch. Eigler wrote:
 So then there's uprobes, which is another potential utrace killer
 app

That's bollocks, uprobes is an utter and total mis-match for utrace.
Probing userspace is primarily about DSOs which is files and vma's, not
tasks.

You might maybe want a utrace interface to that, but that is largely
non-interesting.

IOW, we don't need utrace to make sensible use of uprobes.

(And when I speak of uprobes I mean the thing formerly called UBP)



Re: linux-next: add utrace tree

2010-01-22 Thread Oleg Nesterov
On 01/21, Linus Torvalds wrote:

 I realize that my argument is very anti-thetical to the normal CS teaching
 of general-purpose is good. I often feel that very specific code with
 very clearly defined (and limited) applicability is a good thing - I'd
 rather have just a very specific ptrace layer that does nothing but
 ptrace, than a generic plugin layer that can be layered under ptrace and
 other things.

I am repeating the same (and probably poor) arguments, but we don't have a
clearly defined ptrace layer. The current code is just the set of precedents,
I mean, this code does this because we always did this for unknown reason.
And we can't fix it without breaking things. Even the obvious bugs which
could be fixed by the very simple patch should be preserved sometimes.
In fact, afaics the current state is: if it can't crash the kernel - it is
not the bug.

Otoh, ptrace is very limited, yes. Imho - too limited. And, as a user-space
api, it is just horrible.

However: we're not ever going to get rid of it. Yes, sure.


But I am afraid this all is almost off-topic. Afaik, utrace was not created
to solve the problems with ptrace, at least I am sure this wasn't the only
goal.

Unfortunately, I didn't participate in other projects which use utrace.
Even if I did, I don't know how could I prove they are important enough
to have a generic layer to make other things possible.

Oleg.



Re: linux-next: add utrace tree

2010-01-22 Thread Frank Ch. Eigler
Hi -

On Fri, Jan 22, 2010 at 09:16:16PM +0100, Peter Zijlstra wrote:
 [...]
  So then there's uprobes, which is another potential utrace killer
  app

 That's bollocks, uprobes is an utter and total mis-match for utrace.
 Probing userspace is primarily about DSOs which is files and vma's,
 not tasks. [...]

Your experience with user-space probing apparently differs from ours.
In fact there exists plenty of interest and utility in probing given
processes only, if for no other reason then to avoid disrupting others
running on the machine.

Nearly always, it is better to build a multiprocess probing widget
from multiply-applied single-process ones, rather than to build
single-process probing from grossly-filtered systemwide/VMA ones.
(If the lower level infrastructure provides both options, groovy.)

- FChE



Re: linux-next: add utrace tree

2010-01-22 Thread Frank Ch. Eigler
Hi -

On Fri, Jan 22, 2010 at 01:59:11PM -0800, Linus Torvalds wrote:
 [...]
  Finally, I don't know how to address the logic of if a feature
  requires utrace, that's a bad argument for utrace and at the same
  time you need to show a killer app for utrace.  What could possibly
  satisfy both of those constraints?  Please advise.
 
 The point is, the feature needs to be a killer feature. And I have yet to 
 hear _any_ such killer feature, especially from a kernel maintenance 
 standpoint.


 The better ptrace than ptrace is irrelevant. Sure, we all know ptrace 
 isn't a wonderful feature. But it's there, and a debugger is going to have 
 support for it anyway, so what's the _advantage_ of a better ptrace 
 interface? There is absolutely _zero_ advantage, there's just yet 
 another interface. We can't get rid of the old one _anyway_.

The point is that the intermediate api will allow (and, as the part
you clipped out about utrace-gdbstub said, *already has allowed*)
alternative plausible interfaces that coexist just fine.


 And the seccomp replacement just sounds horrible. Using some tracing 
 interface to implement security models sounds like the worst idea ever.

So all this is about *naming* utrace?  It was never built for
tracing, but for (efficient/multiplexed) *control*.  That wasn't even
its original name -- one of your lieutenants asked roland to change it
to utrace.


 And like it or not, over the last almost-decade, _not_ having to
 have to work with system tap has been a feature, not a problem, for
 the kernel community.

I don't have a problem with that.  We have apprx. never imposed
anything on developers who didn't want to use it.  There are plenty
who have and will.


- FChE



Re: linux-next: add utrace tree

2010-01-22 Thread Linus Torvalds


On Fri, 22 Jan 2010, Frank Ch. Eigler wrote:
 
 The point is that the intermediate api will allow (and, as the part
 you clipped out about utrace-gdbstub said, *already has allowed*)
 alternative plausible interfaces that coexist just fine.

And my point is that multiple interfaces are BAD. 

There is one interface we _have_ to have: the traditional ptrace one. That 
one we can't get away from.

Multiple interfaces on its own is just confusion with no upside. 

You need a _reason_ to have other interfaces. They need to have that 
killer feature. Just being different is not a feature at all.

 So all this is about *naming* utrace?  It was never built for
 tracing, but for (efficient/multiplexed) *control*.  That wasn't even
 its original name -- one of your lieutenants asked roland to change it
 to utrace.

No. It's not about naming. It's about the downside of having amorphous 
interfaces that apparently don't even have rules, and are then used to 
implement random crap.

Yes, the SNL skit about It's a dessert topping _and_ a floor wax was 
funny, but it was funny exactly because it was crazy.

The fact that you can do crazy things is not a good thing. You need to 
find the goodness somewhere else, and that's what I'm trying to tell 
you.

You just seem to have trouble listening. 

Linus



Re: linux-next: add utrace tree

2010-01-22 Thread Linus Torvalds


On Fri, 22 Jan 2010, Linus Torvalds wrote:

 No. It's not about naming. It's about the downside of having amorphous 
 interfaces that apparently don't even have rules, and are then used to 
 implement random crap.
 
 Yes, the SNL skit about It's a dessert topping _and_ a floor wax was 
 funny, but it was funny exactly because it was crazy.

Put yet another way: I'd _much_ rather have two totally separate pieces 
that don't depend on each other, and do different things.

So to take a very practical example: I'd much rather have 'seccomp' and 
'ptrace' that have _nothing_ what-so-ever to do with each other, than have 
some intermediate layer that then needs to make both of those happy, and 
that both have to interact with.

There are cases where we really _want_ to have common code. We want to 
have a common VFS interface because we want to show _one_ interface to 
user space across a gazillion different filesystems. We want to have a 
common driver layer (as far as possible) because - again - we expose a 
metric shitload of drivers, and we want to have one unified interface to 
them.

But going the other way: trying to share code when the interfaces are 
fundamentally _different_ is generally not at all such a great idea. It 
ends up tying two conceptually totally separate things together, and 
suddenly people who work on feature X aneed to modify infrastructure that 
affects feature Y, and it turns ou that details A, B and C are all totally 
different for the two features and the middle layer has two conflicting 
things it needs to work with.

This is why when somebody brought up you could do a seccomp-like thing on 
top of utrace that my reaction was and is just totally negative. It shows 
all the wrong kinds of tying things together.

Linus



Re: linux-next: add utrace tree

2010-01-22 Thread Kyle Moffett
On Fri, Jan 22, 2010 at 19:22, Linus Torvalds
torva...@linux-foundation.org wrote:
 There are cases where we really _want_ to have common code. We want to
 have a common VFS interface because we want to show _one_ interface to
 user space across a gazillion different filesystems. We want to have a
 common driver layer (as far as possible) because - again - we expose a
 metric shitload of drivers, and we want to have one unified interface to
 them.

So... Everybody agrees that ptrace() is horrible and a royal pain to
use, let alone use correctly and without bugs.  Everybody also agrees
that ptrace() needs to stay around for a long time to avoid breaking
all the existing users.

Now how do we get from here to a moderately portable API for
interrogating, controlling, and intercepting process state?
Essentially it would need to support all of the things that a powerful
debugger would want to do, including modifying registers and memory,
substituting syscall return values, etc.  I believe that utrace is
the kernel side of that API.

The killer app for this will be the ability to delete thousands of
lines of code from GDB, strace, and all the various other tools that
have to painfully work around the major interface gotchas of ptrace(),
while at the same time making their handling of complex processes much
more robust.

The *second* killer app for this is to make it much easier for people
to write new userspace debugging tools.  I love the various
crash-catching tools that different distributions or applications
provide, but they all basically have to trap the SIGSEGV and hope
they're still sensible enough to fork() and exec() a gdb process.

Furthermore, I would love to be able to write debugging tools for
scripting languages that allow me to step across Perl, C, PHP,
assembly code, etc, all within the same process.  In theory that's all
possible today, but given how much of a *pain* ptrace() is to use
correctly, nobody bothers.

Now, with all that said, utrace does not provide any of the
userspace side APIs today... but I think it is a necessary refactoring
if we want to provide a new ideal process-introspection interface
without breaking all the ptrace() users.

Think of the utrace interface as very much like the LSM interface.
Just like with LSMs, there is a lot of active research in debugging
and tracing tools, and nobody can even remotely agree what the hell
they want out of the hooks.  In theory you could add one hook for
every place each security module needs one... but then your fast-path
is littered with always-false test-and-jump statements.  What utrace
provides is the one single test in each fast path that then searches
for and executes the appropriate slow path(s) for that process.

I personally would be very happy to see utrace merged.

Cheers,
Kyle Moffett



Re: linux-next: add utrace tree

2010-01-21 Thread Frank Ch. Eigler
Hi -

On Thu, Jan 21, 2010 at 04:31:45PM -0800, Andrew Morton wrote:
 [...]
  Someone please sell this to us.
 Here's what Oleg said last time I asked this: [...]

I wonder if Roland/Oleg are being too modest in their current role as
ptrace maintainers.  Considering that *they* think of utrace as a
means toward proper refactoring of ptrace, how much further burden of
proof should they shoulder?  To what extent are other subsystem
maintainers required to sell reworkings of their areas, when there
appear to be no drawbacks and at least arguable benefits?

- FChE



Re: linux-next: add utrace tree

2010-01-21 Thread Frank Ch. Eigler
Hi -

On Thu, Jan 21, 2010 at 05:05:41PM -0800, Andrew Morton wrote:

 [...]  ptrace is a nasty, complex part of the kernel which has a
 long history of problems, but it's all been pretty quiet in there
 for the the past few years.  This leads one to expect that a
 rip-out-n-rewrite is a high-risk prospect.  So, quite reasonably,
 one looks for a good reason for taking such risk. [...]

To the extent the discussion is colored by risk avoidance, then the
answer to that would consist of code reviews, and of course a look at
the actual historical reliability of this code.  While some might
enjoy reminding us about the brief kerneloops incident in 2008, let's
keep in mind that versions of this code has been deployed in fedora
and rhel for several *years*, with millions of users.  It's not some
rickety experiment.

To the extent the discussion is colored by the new features enabled
from this refactoring, well, there is Oleg's list which may or may not
have mentioned enabling systemtap's user-space probing.  More details
can be furnished on demand.  Several of the use examples were
constructed in good faith upon request from the kernel community
asking for more and more.  But what's enough?  Who knows, really?


- FChE



Re: linux-next: add utrace tree

2010-01-21 Thread Linus Torvalds


On Thu, 21 Jan 2010, Andrew Morton wrote:
 
 ptrace is a nasty, complex part of the kernel which has a long history
 of problems, but it's all been pretty quiet in there for the the past few
 years.

More importantly, we're not ever going to get rid of it. 

Quite frankly, judging my all past history we have ever seen in kernel 
interfaces, new an non-portable interfaces simply are never used. The 
whole question whether they are nicer or not is entirely immaterial. 

I'm personally very dubious that there are any merits to utrace that 
outweigh the very clear disadvantages: just another layer that adds a new 
level of abstraction to the only interface that people actually _use_, 
namely ptrace.

But I haven't followed utrace. I doubt _anybody_ has, except for the 
utrace people themselves.

Linus



Re: linux-next: add utrace tree

2010-01-21 Thread Linus Torvalds


On Thu, 21 Jan 2010, Frank Ch. Eigler wrote:
 
 To the extent the discussion is colored by the new features enabled
 from this refactoring, well, there is Oleg's list which may or may not
 have mentioned enabling systemtap's user-space probing.

Let's face it, system tap isn't going to be merged, so why even bring it 
up? Every kernel developer I have _ever_ seen agrees that all the new 
tracing is a million times superior. I'm sure there are system tap people 
who disagree, but quite frankly, I don't see it being merged considering 
how little the system tap people ever did for the kernel.

So if things like system tap and security models that go behind the 
kernel by tying into utrace are the reasons for utrace, color me utterly 
uninterested. In fact, color me actively hostile. I think that's the worst 
possible situation that we'd ever be in as kernel people (namely exactly 
the do things in kernel space by hiding behind utrace without having 
kernel people involved)

Linus



Re: linux-next: add utrace tree

2010-01-21 Thread Frank Ch. Eigler
Hi -

On Thu, Jan 21, 2010 at 05:32:47PM -0800, Linus Torvalds wrote:
 [...]
  To the extent the discussion is colored by the new features enabled
  from this refactoring, well, there is Oleg's list which may or may not
  have mentioned enabling systemtap's user-space probing.
 
 Let's face it, system tap isn't going to be merged, so why even bring it 
 up?

It was certainly not meant to derail the discussion about the merits
of utrace as a useful cleanup API in its own right, but rather to be
an example of what kinds of things become straightforward in its
presence.  You may be aware of nascent efforts to bring the same
uprobes infrastructure to perf.

 Every kernel developer I have _ever_ seen agrees that all the new
 tracing is a million times superior. [...]

And that is fine.  We believe there is plenty of space in the problem
domain for different approaches.

 ... considering how little the system tap people ever did for the kernel.

Less passionate analysis would identify a long history of contribution
by the the greater affiliated team, including via merged code and by
and passing on requirements and experiences.  We have been trying to
share as much as you have been willing to take.  While systemtap's
current codebase may not (and need not) have a future inside the
kernel, chances are good that improvements in common infrastructure
will allow systemtap to shrink and change enough that the question
becomes moot.


- FChE



Re: linux-next: add utrace tree

2010-01-21 Thread Linus Torvalds


On Thu, 21 Jan 2010, Frank Ch. Eigler wrote:
 
 Less passionate analysis would identify a long history of contribution
 by the the greater affiliated team, including via merged code and by
 and passing on requirements and experiences.

The reason I'm so passionate is that I dislike the turn the discussion was 
taking, as if utrace was somehow _good_ because it allowed various other 
interfaces to hide behind it. And I'm not at all convinced that is true. 

And I really didn't want to single out system tap, I very much feel the 
same way abotu some seccomp-replacement security model that the kernel 
doesn't even need to know about thing.

So don't take the systemtap part to be the important part, it's the bigger 
issue of I'd much rather have explicit interfaces than have generic hooks 
that people can then use in any random way.

I realize that my argument is very anti-thetical to the normal CS teaching 
of general-purpose is good. I often feel that very specific code with 
very clearly defined (and limited) applicability is a good thing - I'd 
rather have just a very specific ptrace layer that does nothing but 
ptrace, than a generic plugin layer that can be layered under ptrace and 
other things.

In one case, you know exactly what the users are, and what the semantics 
are going to be. In the other, you don't. 

So I really want to see a very big and immediate upside from utrace. 
Because to me, the it's a generic layer with any application you want to 
throw at it is a _downside_.

Linus



Re: linux-next: add utrace tree

2010-01-21 Thread Ananth N Mavinakayanahalli
On Thu, Jan 21, 2010 at 05:28:42PM -0800, Linus Torvalds wrote:
 
 
 On Thu, 21 Jan 2010, Andrew Morton wrote:
  
  ptrace is a nasty, complex part of the kernel which has a long history
  of problems, but it's all been pretty quiet in there for the the past few
  years.
 
 More importantly, we're not ever going to get rid of it. 

FWIW, Oleg's implementation of ptrace over utrace is 100% compatible
with legacy ptrace; gdb testsuite indicates that
(http://lkml.org/lkml/2009/12/21/98).

Ananth




Fw: Re: linux-next: add utrace tree

2010-01-21 Thread Srikar Dronamraju
Hi Roland, Oleg,

Would it be a good idea to probably start looking at user space api for
utrace? By doing that we would get usecases that maintainers in LKML are
looking for and start looking at its usefulness.

Currently its probably a egg and chicken case where they look at what
end customers are getting that additional benefit from utrace and we are
looking at providing the user interface after the bits go in.

--
Thanks and Regards
Srikar
---BeginMessage---
On Fri, 22 Jan 2010 11:17:47 +1100
Stephen Rothwell s...@canb.auug.org.au wrote:

 Any thoughts?

I'm nearly a week behind again and am trying to avoid thinking.

I've had a (n old) version of utrace in -mm for ages and it didn't
break anything.

I still don't think I've seen a really compelling reason for merging
it.  At least, I wouldn't be able to explain why we did it.  But
presumably there _are_ such reasons, because it was a lot of development work.

Someone please sell this to us.

---End Message---


Re: linux-next: add utrace tree

2010-01-20 Thread Frederic Weisbecker
On Wed, Jan 20, 2010 at 12:10:26PM +0530, Ananth N Mavinakayanahalli wrote:
  It will cause conflicts with various other trees and increases the overhead 
  all around. It also causes us to trust linux-next bugreports less - as it's 
  not the 'next Linux' anymore. Also, there's virtually no high-level 
  technical 
  review done in linux-next: the trees are implicitly trusted (because they 
  are 
  pushed by maintainers), bugs and conflicts are reported but otherwise it's 
  a 
  neutral tree that includes pretty much any commit indiscriminately.
  
  If you need review and testing there's a number of trees you can get 
  inclusion 
  into.
 
 So would -tip be one of them? If so could you pull the utrace-ptrace
 branch in?
 
 Or did you intend some other tree (random-tracing)? (Though I think a
 ptrace reimplementation isn't 'random'-tracing :-))


Heh. No this is a tree I use for, well, random tracing patches indeed,
which has extended to random tracing/perf/* patches by the time.
I sometimes relay other's patches to Ingo toward this tree but this is
usually about small volumes and for small term storage: patches that
have been reviewed/acked already.

utrace/uprobe is about high volume and longer time debate/review/maintainance
and I won't have the time to carry this.


 
 Ananth



Re: linux-next: add utrace tree

2010-01-20 Thread Frank Ch. Eigler
Hi -

On Wed, Jan 20, 2010 at 05:59:59PM +1100, Stephen Rothwell wrote:
 [...]
  Including experimental code that is RFC and which is not certain to go 
  upstream is certainly not the purpose of linux-next though.
 
 Ingo is correct in what he says here.  See the boilerplate:
 [...]
 Basically, this should be just what you would send to Linus (or ask him
 to fetch).
 I will remove this tree from linux-next tomorrow and wait until it is
 more ready for mainline inclusion.

Please reconsider.  Ingo mistook what was being proposed.  We request
merge/integration testing for just the set of patches posted
http://lkml.org/lkml/2009/12/17/466, which was in response to
peterz's earlier review comments, and none of which is labeled or
considered RFC or experimental.

Ananth was right that the utrace-ptrace git branch represents this
rather than master.

- FChE


pgpQGsDelG5SS.pgp
Description: PGP signature


Re: linux-next: add utrace tree

2010-01-20 Thread Roland McGrath
 Frank, please be clear as to which branch you want included (master or
 utrace-ptrace).  Also note that neither of those branches matches what
 was posted in the sense that they both have lots of history and merges
 not represented in the patches.  (I assume that they do produce the same
 final source tree, though).

Yes, the trees do match.  I certainly never expected our ancient git
history to get merged in directly upstream.  I've made a new branch on:
git://git.kernel.org/pub/scm/linux/kernel/git/frob/linux-2.6-utrace.git
called:
next/master
(Actually it's on master.kernel.org and the public mirror is being a little
slow as I write this.)

This starts from v2.6.33-rc4 and then has commits for the 7 patches that
Oleg posted in December.  Beyond that, we've added one follow-on patch to
fix a bug Oleg just tracked down (Oleg will post that patch soon).  And
I've added one more commit with a MAINTAINERS update, shown below.

You can also find the same stuff from the series file and patch files in:
http://people.redhat.com/utrace/2.6-next/

If it makes things easier for linux-next to have this git branch either
rebased or merged from a different fork point, please let me know.


Thanks,
Roland

---
[PATCH] MAINTAINERS: add utrace

This updates the ptrace entry to cover utrace too.
They are part of the same maintenance effort.
Also add the utrace mailing list.

Signed-off-by: Roland McGrath rol...@redhat.com
---
 MAINTAINERS |7 +--
 1 files changed, 5 insertions(+), 2 deletions(-)

diff --git a/MAINTAINERS b/MAINTAINERS
index c8f47bf..8da2a0a 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -4375,15 +4375,18 @@ M:  Jim Paris j...@jtan.com
 L: cbe-oss-...@ozlabs.org
 S: Maintained
 
-PTRACE SUPPORT
+PTRACE AND UTRACE SUPPORT
 M: Roland McGrath rol...@redhat.com
 M: Oleg Nesterov o...@redhat.com
+L: utrace-devel@redhat.com
 S: Maintained
 F: include/asm-generic/syscall.h
 F: include/linux/ptrace.h
 F: include/linux/regset.h
 F: include/linux/tracehook.h
-F: kernel/ptrace.c
+F: include/linux/utrace.h
+F: kernel/ptrace*
+F: kernel/utrace*
 
 PVRUSB2 VIDEO4LINUX DRIVER
 M: Mike Isely is...@pobox.com



Re: linux-next: add utrace tree

2010-01-19 Thread Stephen Rothwell
Hi Frank,

On Tue, 19 Jan 2010 16:16:46 -0500 Frank Ch. Eigler f...@redhat.com wrote:

 Having been reviewed a couple of times, and we hope being a good
 candidate for merging next time, please start pulling
 
 git://git.kernel.org/pub/scm/linux/kernel/git/frob/linux-2.6-utrace.git 
 branch master

I have added this from today with you and utrace-devel as the contacts.
I have cc'd the wider community on this email so that people are aware
that this has been included.

 This repo contains frequent merges from Linus' tree.  If you'd prefer
 a cleaner rebase-based branch to pull from, we can make one of those too.

For now it is OK, but you might like to ask Linus if he would like it
cleaned up before submission since it seems to have history right back to
2.6.29 and (as you say) lots of merges with his tree.

You should also add a commit with an entry in MAINTAINERS.

[Standard boilerplate]

Thanks for adding your subsystem tree as a participant of linux-next.  As
you may know, this is not a judgment of your code.  The purpose of
linux-next is for integration testing and to lower the impact of
conflicts between subsystems in the next merge window. 

You will need to ensure that the patches/commits in your tree/series have
been:
 * submitted under GPL v2 (or later) and include the Contributor's
Signed-off-by,
 * posted to the relevant mailing list,
 * reviewed by you (or another maintainer of your subsystem tree),
 * successfully unit tested, and 
 * destined for the current or next Linux merge window.

Basically, this should be just what you would send to Linus (or ask him
to fetch).  It is allowed to be rebased if you deem it necessary.

-- 
Cheers,
Stephen Rothwell 
s...@canb.auug.org.au

Legal Stuff:
By participating in linux-next, your subsystem tree contributions are
public and will be included in the linux-next trees.  You may be sent
e-mail messages indicating errors or other issues when the
patches/commits from your subsystem tree are merged and tested in
linux-next.  These messages may also be cross-posted to the linux-next
mailing list, the linux-kernel mailing list, etc.  The linux-next tree
project and IBM (my employer) make no warranties regarding the linux-next
project, the testing procedures, the results, the e-mails, etc.  If you
don't agree to these ground rules, let me know and I'll remove your tree
from participation in linux-next.


pgpLabqIDVtHS.pgp
Description: PGP signature


Re: linux-next: add utrace tree

2010-01-19 Thread Ananth N Mavinakayanahalli
On Wed, Jan 20, 2010 at 06:49:50AM +0100, Ingo Molnar wrote:

Ingo,

 Note, i'm not yet convinced that this (and the rest: uprobes and systemtap, 
 etc.) can go uptream in its present form.

Agreed, uprobes is still not upstream ready -- it was an RFC. We are
working through the comments there to get it ready for merger.

 IMHO the far more important thing to address beyond formalities and workflow 
 cleanliness are the (many) technical observations and objections offered by 
 Peter Zijstra on lkml. Not just the git history but also the abstractions and 
 concepts are messy and should be reworked IMO, and also good and working perf 
 events integration should be achieved, etc.

I think Oleg addressed most of Peter's concerns on utrace when the
ptrace/utrace patchset was reposted.

Perf integration with uprobes will be done and discussions have started
with Masami and Frederic. There are a couple of fundamental technical
aspects (XOL vma vs. emulation; breakpoint insertion through CoW and not
through quiesce) that need resolution.

 The fact that there's a well established upstream workflow for 
 instrumentation 
 patches, which is being routed around by the utrace/uprobes/systemtap code 
 here is not a good sign in terms of reaching a good upstream solution. Lets 
 hope it works out well though.

Agreed.

On the other hand, having ptrace/utrace in the -next tree will give it a
lot more testing, while any outstanding technical issues are being addressed.

Stephen,
To exercise ptrace/utrace, it would be very useful if you pulled in

git://git.kernel.org/pub/scm/linux/kernel/git/frob/linux-2.6-utrace.git branch 
utrace-ptrace

instead of 'master'.

Thanks,
Ananth



Re: linux-next: add utrace tree

2010-01-19 Thread Stephen Rothwell
Hi Frank,

On Wed, 20 Jan 2010 07:28:34 +0100 Ingo Molnar mi...@elte.hu wrote:

 Including experimental code that is RFC and which is not certain to go 
 upstream is certainly not the purpose of linux-next though.

Ingo is correct in what he says here.  See the boilerplate:

 * destined for the current or next Linux merge window.

Basically, this should be just what you would send to Linus (or ask him
to fetch).

I will remove this tree from linux-next tomorrow and wait until it is
more ready for mainline inclusion.

-- 
Cheers,
Stephen Rothwells...@canb.auug.org.au
http://www.canb.auug.org.au/~sfr/


pgp45X43xbpbG.pgp
Description: PGP signature


Re: linux-next: add utrace tree

2010-01-19 Thread Ingo Molnar

* Ingo Molnar mi...@elte.hu wrote:

 
 * Ananth N Mavinakayanahalli ana...@in.ibm.com wrote:
 
  On Wed, Jan 20, 2010 at 06:49:50AM +0100, Ingo Molnar wrote:
  
  Ingo,
  
   Note, i'm not yet convinced that this (and the rest: uprobes and 
   systemtap, 
   etc.) can go uptream in its present form.
  
  Agreed, uprobes is still not upstream ready -- it was an RFC. We are
  working through the comments there to get it ready for merger.
  
   IMHO the far more important thing to address beyond formalities and 
   workflow 
   cleanliness are the (many) technical observations and objections offered 
   by 
   Peter Zijstra on lkml. Not just the git history but also the abstractions 
   and 
   concepts are messy and should be reworked IMO, and also good and working 
   perf 
   events integration should be achieved, etc.
  
  I think Oleg addressed most of Peter's concerns on utrace when the 
  ptrace/utrace patchset was reposted.
 
 Peter is Cc:-ed and he might want to chime in.
 
  Perf integration with uprobes will be done and discussions have started 
  with 
  Masami and Frederic. There are a couple of fundamental technical aspects 
  (XOL vma vs. emulation; breakpoint insertion through CoW and not through 
  quiesce) that need resolution.
  
   The fact that there's a well established upstream workflow for 
   instrumentation 
   patches, which is being routed around by the utrace/uprobes/systemtap 
   code 
   here is not a good sign in terms of reaching a good upstream solution. 
   Lets 
   hope it works out well though.
  
  Agreed.
  
  On the other hand, having ptrace/utrace in the -next tree will give it a
  lot more testing, while any outstanding technical issues are being 
  addressed.
 
 Including experimental code that is RFC and which is not certain to go 
 upstream is certainly not the purpose of linux-next though.
 
 It will cause conflicts with various other trees and increases the overhead 
 all around. It also causes us to trust linux-next bugreports less - as it's 
 not the 'next Linux' anymore. Also, there's virtually no high-level 
 technical review done in linux-next: the trees are implicitly trusted 
 (because they are pushed by maintainers), bugs and conflicts are reported 
 but otherwise it's a neutral tree that includes pretty much any commit 
 indiscriminately.
 
 If you need review and testing there's a number of trees you can get 
 inclusion into.

Btw., the utrace code has lived in -mm for quite some time - that's an 
excellent route as Andrew does thorough review and testing.

If Andrew agrees with this particular tree as-is and wants these bits to live 
in linux-next and have it in -mm that way then that's a fair approach 
obviously and i have no objections ...

The point is to have at least one relevant maintainer request and track it and 
then supervise the completion of it (which includes the resolution of all 
outstanding objections) and then push it to Linus.

Ingo