Re: [RFC] [PATCH 1/7] User Space Breakpoint Assistance Layer (UBP)

2010-01-17 Thread Avi Kivity

On 01/16/2010 02:58 AM, Jim Keniston wrote:


I hear (er, read) you.  Emulation may turn out to be the answer for some
architectures.  But here are some things to keep in mind about the
various approaches:

1. Single-stepping inline is easiest: you need to know very little about
the instruction set you're probing.  But it's inadequate for
multithreaded apps.
2. Single-stepping out of line solves the multithreading issue (as do #3
and #4), but requires more knowledge of the instruction set.  (In
particular, calls, jumps, and returns need special care; as do
rip-relative instructions in x86_64.)  I count 9 architectures that
support kprobes.  I think most of these do SSOL.
3. Boosted probes (where an appended jump instruction removes the need
for the single-step trap on many instructions) require even more
knowledge of the instruction set, and like SSOL, require XOL slots.
Right now, as far as I know, x86 is the only architecture with boosted
kprobes.
4. Emulation removes the need for the XOL area, but requires pretty much
total knowledge of the instruction set.  It's also a performance win for
architectures that can't do #3.  I see kvm implemented on 4
architectures (ia64, powerpc, s390, x86).  Coincidentally, those are the
architectures to which uprobes (old uprobes, with ubp and xol bundled
in) has already been ported (though Intel hasn't been maintaining their
ia64 port).  So it sort of comes down to how objectionable the XOL vma
(or page) really is.
   


The kvm emulator emulates only a subset of the x86 instruction set 
(basically mmio instructions and commonly-used page-table manipulation 
instructions, as well as some privileged instructions).  It would take a 
lot of work to expand it to be completely generic; and even then it will 
fail if userspace uses an instruction set extension the kernel is not 
aware of.


To me, boosted probes with a fallback to single-stepping seems to be the 
better option by far.


--
error compiling committee.c: too many arguments to function



Re: [RFC] [PATCH 1/7] User Space Breakpoint Assistance Layer (UBP)

2010-01-17 Thread Peter Zijlstra
On Sun, 2010-01-17 at 16:56 +0200, Avi Kivity wrote:
 On 01/17/2010 04:52 PM, Peter Zijlstra wrote:

  Also, if its fixed size you're imposing artificial limits on the number
  of possible probes.
 
 
 Obviously we'll need a limit, a uprobe will also take kernel memory, we 
 can't allow people to exhaust it.

Only if its unprivilidged, kernel and root should be able to place as
many probes until the machine keels over.



Re: [RFC] [PATCH 1/7] User Space Breakpoint Assistance Layer (UBP)

2010-01-17 Thread Peter Zijlstra
On Sun, 2010-01-17 at 16:59 +0200, Avi Kivity wrote:
 On 01/17/2010 04:52 PM, Peter Zijlstra wrote:
  On Sun, 2010-01-17 at 16:39 +0200, Avi Kivity wrote:
 
  On 01/15/2010 11:50 AM, Peter Zijlstra wrote:
   
  As previously stated, I think poking at a process's address space is an
  utter no-go.
 
 
  Why not reserve an address space range for this, somewhere near the top
  of memory?  It doesn't have to be populated if it isn't used.
   
  Because I think poking at a process's address space like that is gross.
  Also, if its fixed size you're imposing artificial limits on the number
  of possible probes.
 
 
 btw, an alternative is to require the caller to provide the address 
 space for this.  If the caller is in another process, we need to allow 
 it to play with the target's address space (i.e. mmap_process()).  I 
 don't think uprobes justifies this by itself, but mmap_process() can be 
 very useful for sandboxing with seccomp.

mmap_process() sounds utterly gross, one process playing with another
process's address space.. yuck!



Re: [RFC] [PATCH 1/7] User Space Breakpoint Assistance Layer (UBP)

2010-01-17 Thread Avi Kivity

On 01/17/2010 05:03 PM, Peter Zijlstra wrote:



btw, an alternative is to require the caller to provide the address
space for this.  If the caller is in another process, we need to allow
it to play with the target's address space (i.e. mmap_process()).  I
don't think uprobes justifies this by itself, but mmap_process() can be
very useful for sandboxing with seccomp.
 

mmap_process() sounds utterly gross, one process playing with another
process's address space.. yuck!
   


This is debugging.  We're playing with registers, we're playing with the 
cpu, we're playing with memory contents.  Why not the address space as well?


For seccomp, this really should be generalized.  Run a system call on 
behalf of another process, but don't let that process do anything to 
affect it.  I think Google is doing something clever with one thread in 
seccomp mode and another unconstrained, but that's very hacky - you have 
to stop the constrained thread so it can't interfere with the live one.


--
Do not meddle in the internals of kernels, for they are subtle and quick to 
panic.



Re: [RFC] [PATCH 1/7] User Space Breakpoint Assistance Layer (UBP)

2010-01-17 Thread Peter Zijlstra
On Sat, 2010-01-16 at 18:48 -0500, Jim Keniston wrote:

 As you may have noted before, I think FP would be a special problem  
 for your approach.  I'm not sure how folks would react to the idea of  
 executing FP instructions in kernel space.  But emulating them is also  
 tough.  There's an IEEE FP emulation package somewhere in one of the  
 Linux arch directories, but I'm not sure how precise it is, and  
 dropping even 1 bit of precision is unacceptable for many  
 applications, since such errors tend to grow in complex computations  
 employing many FP instructions.

Well, we have kernel space using FP/MMX/SSE like things, its not hard if
you really need it, but in this case I think its easier than normal,
because we'll just allow it to change the userspace state because that
is exactly what we want it to do.



Re: [RFC] [PATCH 1/7] User Space Breakpoint Assistance Layer (UBP)

2010-01-17 Thread Peter Zijlstra
On Sat, 2010-01-16 at 19:12 -0500, Bryan Donlan wrote:
 On Fri, Jan 15, 2010 at 7:58 PM, Jim Keniston jkeni...@us.ibm.com wrote:
 
  4. Emulation removes the need for the XOL area, but requires pretty much
  total knowledge of the instruction set.  It's also a performance win for
  architectures that can't do #3.  I see kvm implemented on 4
  architectures (ia64, powerpc, s390, x86).  Coincidentally, those are the
  architectures to which uprobes (old uprobes, with ubp and xol bundled
  in) has already been ported (though Intel hasn't been maintaining their
  ia64 port).  So it sort of comes down to how objectionable the XOL vma
  (or page) really is.
 
 On x86 at least, wouldn't one option to be to run the instruction to
 be emulated in CPL ('ring') 2, from a XOL page above the user-kernel
 split, not accessible to userspace at CPL 3? Linux hasn't
 traditionally used anything other than CPL 0 and CPL 3 (plus CPL 1 on
 Xen), but it would seem to avoid many of the problems here - it's
 invisible to normal userspace code and so doesn't pollute userspace
 memory maps with kernel-private stuff, but since it's running at a
 higher CPL than the kernel, we can still protect kernel memory and
 protect against privileged instructions.

Another option is to go play games with the RPL of the user data
segments when we load them. But yeah, something like this seems to
nicely deal with the protection issues.



Re: [RFC] [PATCH 1/7] User Space Breakpoint Assistance Layer (UBP)

2010-01-17 Thread Peter Zijlstra
On Sun, 2010-01-17 at 21:33 +0200, Avi Kivity wrote:
 On 01/17/2010 05:03 PM, Peter Zijlstra wrote:
 
  btw, an alternative is to require the caller to provide the address
  space for this.  If the caller is in another process, we need to allow
  it to play with the target's address space (i.e. mmap_process()).  I
  don't think uprobes justifies this by itself, but mmap_process() can be
  very useful for sandboxing with seccomp.
   
  mmap_process() sounds utterly gross, one process playing with another
  process's address space.. yuck!
 
 
 This is debugging.  We're playing with registers, we're playing with the 
 cpu, we're playing with memory contents.  Why not the address space as well?

Because you want thins go to be as transparent as possible in order to
avoid heisenbugs. Sure we cannot avoid everything, but we should avoid
everything we possibly can.

Also, aside of the VDSO, we simply do not force map things into address
spaces (and like said before, I think the VDSO stinks for doing that)
and I think we don't want to create (more) precedents in this case.