Re: linux-next: add utrace tree

2010-01-28 Thread Ingo Molnar

* Jim Keniston jkeni...@us.ibm.com wrote:

 On Wed, 2010-01-27 at 09:54 +0100, Ingo Molnar wrote:
 ...
  I think the best solution for user probes (by far) is to use a simplified 
  in-kernel instruction emulator for the few common probes instruction. 
  (Kprobes 
  already partially decodes x86 instructions to make it safe to apply 
  accelerated probes and there's other decoding logic in the kernel too.)
  
  The design and practical advantages are numerous:
  
   - People want to probe their function prologues most of the time ...
 a single INT3 there will in most cases just hit the initial stack 
 allocation and that's it.
 
 Yes, emulating push %ebp would buy us a lot of coverage for a lot of apps 
 on x86 (but see below**). [...]

Coverage in practice is all that matters.

Consider the fact that i get 1000 times more bugreports aided by strace, which 
has 1000 times more overhead than even the slowest of uprobes approaches.

This simple fact tell us that while performance matters, it is of little use 
if good utility and a clean design is not there. (in fact sane and clean 
design will almost automatically result in good performance too down the line, 
but i digress.) Faster crap is still crap.

 [...]  Even there, though, we'd have to address the page fault we'd 
 occasionally get when extending the stack vma.

Nope, in the simplest model not even page fault emulation is needed, 
get_user()/put_user() would resolve it automatically. If you either get the 
value with the pagefault resolved, or you get a -EFAULT.

If you concentrate only on the common case then emulation can be _really_ 
simple.

Lets compare the two cases via a drawing. Your current uprobes submission 
does:

 [kernel]  do probe thing single-step trap
   ^| ^  |
   |v |  v
 [user] INT3XOL-ins  next ins-stream

 ( add the need for serialization to make sure the whole single-step thing 
   does not get out of sync with reality. )

And emulator approach would do:

 [kernel]  emul-demux-fastpath, do probe thing
   ^ |
   | v
 [user] INT3 next ins-stream

far simpler conceptually, and faster as well, because it's one kernel entry.

Generally i get nervous if a piece of instrumentation cannot be expressed in 
simple ways. _Especially_ if i consider it to concentrate on all the wrong 
things and doesnt even break even with a far less complex scheme.

What would be the 'right things' to concentrate on? Make sure it's all all 
around end-to-end package that is _useful to people_. As of today i have yet 
to get a _single_ bugreport or kernel improvement requested by an application 
writer who found out about the inefficiencies in his app using uprobes. There 
is a gaping hole of utility here, a whole cathedral of tools written that just 
a handful of ordinary Linux person uses. There's big disconnect and i can say 
one thing for sure: needless complexity in the wrong places can outright 
stiffle tools from becoming good.

  We could get quite good coverage (and very fast 
 emulation) for the common case in not too much code - and much of that 
  code 
 we already have available. No re-trapping,
 
 As previously discussed, boosting would also get rid of the single-step trap 
 for most instructions.

Boosting is not in the uprobes patch-set you submitted. Even with it present 
it wont get rid of the initial INT3. So basically _best-case_ (with boosting) 
XOL-uprobes could roughly break even with a pure emulator approach ...

That's a big and fundamental difference.

  no extra instruction patching 
 
 x86_64 rip-relative instructions are the only ones we alter.
 
 and complex maintenance of trampolines.
  
   - It's as transparent as it gets - no user-space trampoline or other 
  visible
 state that modifies behavior or can be stomped upon by user-space bugs.
 
 The XOL vma isn't writable from user space, so I can't think of how it could 
 be clobbered merely by a stray memory reference. [...]

Well there must be some purpose to the instrumentation, there must be some way 
to save data, right? If yes and it's in user-space, that data is clobberable. 
If it's in kernel-space then we have to enter the kernel anyway (with similar 
cost patterns to an INT3 entry) - so we just delayed the kernel entry.

So IMHO you have designed in considerable complexity for little immediate 
benefit.

 [...]  Yes, it's a vma that the unprobed app would never have; and yes, a 
 malicious app or kernel module could remove it or alter the protection and 
 scribble on it.  We don't try to defend the app against such malicious 
 attacks, but we do our best to ensure that the kernel side handles such 
 attacks gracefully.
 
   - Lightweight and simple probe insertion: no weird setup sequence needing 
  the 

Travaillez en toute liberté avec le portage salarial

2010-01-28 Thread PORTAGEO
Title: option-portage





Si le message ne s'affiche pas visualisez la version en
ligne


















Florent, 28
ans
Consultant


Christian, 49
ans
Coach en
   
marketing




Sbastien, 31 ans
Infographiste
Dveloppeur
web


Myriam, 45
ans
Traductrice






L'opportunit de travailler autrement
Bnficiez du
statut salarial
(scurit sociale, mutuelle, prvoyance,
retraite,
assurance chmage, tickets
restaurant, CE...)

Grez votre
emploi du temps,
votre
propre clientle,
comme vous le souhaitez















en France et  linternational









Si vous ne souhaitez plus recevoir notre
newsletter, remplissez
ce formulaire








Re: linux-next: add utrace tree

2010-01-28 Thread Benjamin Herrenschmidt
On Mon, 2010-01-25 at 08:52 -0800, Linus Torvalds wrote:
 
 That said, I also suspect that people should still look seriously at 
 simply just improving ptrace. For example, I suspect that the biggest 
 problem with ptrace is really just the signalling, and that creating a
 new 
 extension for JUST THAT, and then having a model where you can choose
 - at 
 PTRACE_ATTACH time - how to wait for events would be a good thing.

like returning a fd to poll() on ? :-)

Cheers,
Ben.




Re: linux-next: add utrace tree

2010-01-28 Thread Linus Torvalds


On Fri, 29 Jan 2010, Benjamin Herrenschmidt wrote:
 
 like returning a fd to poll() on ? :-)

Well, there's the possibility of async polling (rather than the 
synchronous wait that ptrace forces now), but there are other advantages 
to having a connection model - like not having to look up the child 
process every time like ptrace does now.

Although 'find_task_by_vpid()' is probably cheap enough that nobody really 
cares. We do a fair job at those hash tables.

Linus



Re: linux-next: add utrace tree

2010-01-28 Thread Jim Keniston
On Thu, 2010-01-28 at 09:55 +0100, Ingo Molnar wrote:
 * Jim Keniston jkeni...@us.ibm.com wrote:
 
  On Wed, 2010-01-27 at 09:54 +0100, Ingo Molnar wrote:
  ...
  
  Yes, emulating push %ebp would buy us a lot of coverage for a lot of apps 
  on x86 (but see below**). [...]
 
...
 
  [...]  Even there, though, we'd have to address the page fault we'd 
  occasionally get when extending the stack vma.
 
 Nope, in the simplest model not even page fault emulation is needed, 
 get_user()/put_user() would resolve it automatically. If you either get the 
 value with the pagefault resolved, or you get a -EFAULT.

get_user()/put_user() have to be done in a context where you can sleep,
right?  Uprobes currently operates in such contexts, but there's some
talk of moving it all to a DIE_INT3 notifier context, where it can't
sleep.

...

 
   We could get quite good coverage (and very fast 
  emulation) for the common case in not too much code - and much of that 
   code 
  we already have available. No re-trapping,
  
  As previously discussed, boosting would also get rid of the single-step 
  trap 
  for most instructions.
 
 Boosting is not in the uprobes patch-set you submitted. Even with it present 
 it wont get rid of the initial INT3. So basically _best-case_ (with boosting) 
 XOL-uprobes could roughly break even with a pure emulator approach ...
 
 That's a big and fundamental difference.

To be fair, wrt uprobes, emulation and boosting are both in the same
state: pretty well understood, but not yet implemented.

...
   
- It's as transparent as it gets - no user-space trampoline or other 
   visible
  state that modifies behavior or can be stomped upon by user-space bugs.
  
  The XOL vma isn't writable from user space, so I can't think of how it 
  could 
  be clobbered merely by a stray memory reference. [...]
 
 Well there must be some purpose to the instrumentation, there must be some 
 way 
 to save data, right? If yes and it's in user-space, that data is clobberable.

One or two others have advocated an approach (which eliminates the
breakpoint trap) where trace data is stored in the uprobe vma, but I
haven't.  (In such a case, XOL vma would be a misnomer.)  I agree that
in such a scenario, the uprobe vma would of necessity be writable by the
app.

  
 If it's in kernel-space then we have to enter the kernel anyway (with similar 
 cost patterns to an INT3 entry) - so we just delayed the kernel entry.

This seems to presume that you have to extract trace data from the
kernel every time a probe is hit.  In actual practice, you're often just
checking for unusual arg values, incrementing a counter, or some such.

 
...
  Even if we add emulation, it seems sensible to keep the XOL approach as a 
  backup to handle instructions that aren't yet emulated (and architectures 
  that don't yet have emulators).  That way, if you don't probe any 
  unemulated 
  instructions, the XOL vma is never created.
 
 To turn the argument around: an in-kernel emulator is an all-around facility 
 to make sure we probe safely and securely, _and_ it is also more portable 
 because it's simpler (because more gradual) to implement on a new 
 architecture 
 as you dont actually have to copy around instructions (and make sure they 
 work 
 in that new place), but have to emulate a limited subset of the instruction 
 space, on purely local state.

I understand the desire to start small and simple and grow gradually
from there.  We thought we were doing that.  Single-stepping out of line
has been in use for close to a decade, maybe more; and boosting (in
kprobes) has been around for a few years as well.  To the *probes folks,
it feels pretty solid.

 
...
 
 With an emulator (assuming the emulator is correct) we can execute the 
 precise 
 semantics of that instruction in that place - without any side-effects from 
 trampolining/replacement.

And of course, our view has been that the best way to achieve the effect
of the instruction, including all desired side-effects, is to execute
the instruction on the CPU.

...
  
  **In practice, we've had to probe all sorts of instructions, including FP 
  instructions -- especially where you want to exploit the debug info to get 
  the names, types, and locations of variables and args.  For some compilers 
  and architectures, the debug info isn't reliable until the end of the 
  function prologue, at which point you could find any old instruction.  
  Ditto 
  if you want to probe statements within a function.
 
 For those cases, frankly, the right approach is to fix the debug info (or 
 introduce a new one) and forget the old crap.
 
 You treat debuginfo as some god-given property, while it's one of the 
 suckiest 
 aspects of all of Linux. But we've had that discussion months (and years) 
 ago. 
 It has improved in gcc 4.5 so there's some hope.

Yes, there seems to be considerable movement toward better debug info --
which could make statement probing (and not just 

Re: linux-next: add utrace tree

2010-01-28 Thread Ananth N Mavinakayanahalli
On Thu, Jan 28, 2010 at 09:55:02AM +0100, Ingo Molnar wrote:

...

 Lets compare the two cases via a drawing. Your current uprobes submission 
 does:
 
  [kernel]  do probe thing single-step trap
^| ^  |
|v |  v
  [user] INT3XOL-ins  next ins-stream
 
  ( add the need for serialization to make sure the whole single-step thing 
does not get out of sync with reality. )
 
 And emulator approach would do:
 
  [kernel]  emul-demux-fastpath, do probe thing
^ |
| v
  [user] INT3 next ins-stream
 
 far simpler conceptually, and faster as well, because it's one kernel entry.

Ingo,

Yes, conceptually, emulation is simpler. In fact, it may even be the
right thing to do from a housekeeping POV if gdb were enabled to use
breakpoint assistance in the kernel. However... emulation is not
easy. Just quoting Peter Anvin:

 On the more general rule of interpretation: I'm really concerned about
 having a bunch of partially-capable x86 interpreters all over the
 kernel.  x86 is *hard* to emulate, and it will only get harder as the
 architecture evolves.

   -hpa

Yes, I know you suggested we start with a small subset.

We already have an implementation of instruction emulation in kernel for
x86 and powerpc, but its too KVM centric. If there is a generic
emulation layer, we would use it.

There are conflicting opinions for either case; complicated as it is,
the XOL scheme works and, to a large extent, it is easily extendable to
other architectures compared to the emulation approach. Uprobes can be
made to use emulation when possible/available, but I don't think this
should be gating decision for the initial implementation of the feature.

Ananth