Re: [RFC] [PATCH 1/7] User Space Breakpoint Assistance Layer (UBP)

2010-02-07 Thread Avi Kivity

On 01/27/2010 12:23 PM, Ingo Molnar wrote:

* Avi Kivitya...@redhat.com  wrote:
   


(back from vacation)


If so then you ignore the obvious solution to _that_ problem: dont use
INT3 at all, but rebuild (or re-JIT) your program with explicit callbacks.
It's _MUCH_ faster than _any_ breakpoint based solution - literally just
the cost of a function call (or not even that - i've written very fast
inlined tracers - they do rock when it comes to performance). Problem
solved and none of the INT3 details matters at all.
   

However did I not think of that?  Yes, and let's rip off kprobes tracing
from the kernel, we can always rebuild it.

Well, I'm observing an issue in a production system now.  I may not want to
take it down, or if I take it down I may not be able to observe it again as
the problem takes a couple of days to show up, or I may not have the full
source, or it takes 10 minutes to build and so an iterative edit/build/run
cycle can stretch for hours.
 

You have somewhat misconstrued my argument. What i said above is that _if_ you
need extreme levels of performance you always have the option to go even
faster via specialized tracing solutions. I did not promote it as a
replacement solution. Specialization obviously brings in a new set of
problems: infexibility and non-transparency, an example of what you gave
above.

Your proposed solution brings in precisely such kinds of issues, on a
different level, just to improve performance at the cost of transparency and
at the cost of features and robustness.
   


We just disagree on the intrusiveness, then.  IMO it will be a very rare 
application that really suffers from a vma injection, since most apps 
don't manage their vmas directly but leave it to the kernel and ld.so.



It's btw rather ironic as your arguments are somewhat similar to the Xen vs.
KVM argument just turned around: KVM started out slower by relying on hardware
implementation for virtualization while Xen relied on a clever but limiting
hack. With each CPU generation the hardware got faster, while the various
design limitations of Xen are hurting it and KVM is winning that race.

A (partially) similar situation exists here: INT3 into ring 0 and handling it
there in a protected environment might be more expensive, but _if_ it matters
to performance it sure could be made faster in hardware (and in fact it will
become faster with every new generation of hardware).
   


Not at all.  For kvm hardware eliminates exits completely where pv Xen 
tries to reduce their cost, but an INT3 will be forever much more 
expensive than a jump.


You are right however that we should favour hardware support where 
available, and for high bandwidth tracing, it is available: branch trace 
store.  With that, it is easy to know how many times the processor 
passed through some code point as well as to reconstruct the entire call 
chain, basically what the function tracer does for the kernel.


Do we have facilities for exposing that to userspace?  It can also be 
very useful for the kernel.


It will still be slower if we only trace a few points, and it can't 
trace register and memory values, but it's a good tool to have IMO.



Both Peter and me are telling you that we are considering your solution too
specialized, at the cost of flexibility, features and robustness.
   


We'll agree to disagree on that then.

--
error compiling committee.c: too many arguments to function



Re: linux-next: add utrace tree

2010-02-07 Thread Pavel Machek
Hi!

  Right, so you're going to love uprobes, which does exactly that. The
  current proposal is overwriting the target instruction with an INT3 and
  injecting an extra vma into the target process's address space
  containing the original instruction(s) and possible jumps back to the
  old code stream.
 
  Just out of interest, how does it handle the threading issue?
 
  Last I saw, at least some CPU people were _very_ nervous about overwriting 
  instructions if another CPU might be just about to execute them.
  
  I think the issue was that ring 0 was never meant to do that, where as,
  ring 3 does it all the time. Doesn't the dynamic library modify its
  text?
 
 No, it has nothing to do with ring.  It has to do with modifying code
 that another CPU could be executing at the same time, and with modifying
 code on the same processor through another virtual alias (they are
 different issues.)  The same issues apply regardless of the CPL of the
 processor.

...but these are always 'there could be cpu bugs around' issues,
right? Like amd k6. AFAICT x86 always supported self-modifying code
without any extra barriers needed...

-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) 
http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html