Re: [RFC] [PATCH 0/7] UBP, XOL and Uprobes [ Summary of Comments and actions to be taken ]

2010-01-27 Thread Peter Zijlstra
On Wed, 2010-01-27 at 07:53 +0100, Peter Zijlstra wrote:
 On Fri, 2010-01-22 at 12:54 +0530, Ananth N Mavinakayanahalli wrote:
  On Fri, Jan 22, 2010 at 12:32:32PM +0530, Srikar Dronamraju wrote:
   Here is a summary of the Comments and actions that need to be taken for
   the current uprobes patchset. Please let me know if I missed or
   misunderstood any of your comments.  
   
   1. Uprobes depends on trap signal.
 Uprobes depends on trap signal rather than hooking to the global
   die notifier. It was suggested that we hook to the global die notifier.
   
 In the next version of patches, Uprobes will use the global die
 notifier and look at the per-task count of the probes in use to
 see if it has to be consumed.
   
 However this would reduce the ability of uprobe handlers to
 sleep. Since we are dealing with userspace, sleeping in handlers
 would have been a good feature. We are looking at ways to get
 around this limitation.
  
  We could set a TIF_ flag in the notifier to indicate a breakpoint hit
  and process it in task context before the task heads into userspace.
 
 OK, so we can go play stack games in the INT3 interrupt handler by
 moving to a non IST stack when it comes from userspace, or move kprobes
 over to INT1 or something.

Right, it just got pointed out that INT1 doesn't have a single byte
encoding, only INT0 and INT3 :/



Re: linux-next: add utrace tree

2010-01-27 Thread Peter Zijlstra
On Wed, 2010-01-27 at 02:43 -0800, Linus Torvalds wrote:
 
 On Wed, 27 Jan 2010, Peter Zijlstra wrote:
  
  Right, so you're going to love uprobes, which does exactly that. The
  current proposal is overwriting the target instruction with an INT3 and
  injecting an extra vma into the target process's address space
  containing the original instruction(s) and possible jumps back to the
  old code stream.
 
 Just out of interest, how does it handle the threading issue?
 
 Last I saw, at least some CPU people were _very_ nervous about overwriting 
 instructions if another CPU might be just about to execute them.
 
 Even the overwrite only the first byte with 'int3' made them go umm, I 
 need to talk to some core CPU people to see if that's ok. They mumble 
 about possible CPU errata, I$ coherency, instruction retry etc.
 
 I realize kprobes does this very thing, but kprobes is esoteric stuff and 
 doesn't have much choice. In user space, you _could_ do the modification 
 on a different physical page and then just switch the page table entry 
 instead, and not get into the whole D$/I$ coherency thing at all.

Right, so there's two aspects:

 1) concurrency when inserting the probe
 2) concurrency when hitting the probe

1) used to be dealt with by using utrace to stop all threads in the
process and then writing the instruction. I suggested to CoW the page,
modify the instruction, set the pagetable and flush tlbs at full speed
-- the very thing you suggest here.

2) so traditionally (and the intel arch manual describes this) is to
replace the instruction, single step it, and write the probe back. This
is racy for multi-threading. The current uprobes stuff solves this by
doing single-step-out-of-line (XOL).

XOL injects a new vma into the target process and puts the old
instruction there, then it single steps on the new location, leaving the
original site with INT3.

This doesn't work for things like RIP relative instructions, so uprobes
considers them un-probable.

Also, I myself really object to inserting a vma in a running process,
its like a land-lord, sure he has the key but he won't come in an poke
through your things.

The alternative is to place the instruction in TLS or stack space, since
each thread can only have a single trap at a time, you only need space
for 1 instruction (plus a possible jump out to the original site). There
is the 'problem' of marking the TLS/stack executable when being probed.

Then there is the whole emulation angle, the uprobes people basically
say its too much effort to write a x86 emulator.



Re: linux-next: add utrace tree

2010-01-27 Thread Peter Zijlstra
On Wed, 2010-01-27 at 11:55 +0100, Peter Zijlstra wrote:
 Right, so there's two aspects:
 
  1) concurrency when inserting the probe
  2) concurrency when hitting the probe
 
 1) used to be dealt with by using utrace to stop all threads in the
 process and then writing the instruction. I suggested to CoW the page,
 modify the instruction, set the pagetable and flush tlbs at full speed
 -- the very thing you suggest here. 

Also, since executable maps are typically MAP_PRIVATE, you have to CoW
anyway in order to modify it and I would exclude MAP_SHARED from being
probable because then the modification could seep through into whatever
was backing that thing.



Re: linux-next: add utrace tree

2010-01-27 Thread Peter Zijlstra
On Wed, 2010-01-27 at 16:35 +0530, Ananth N Mavinakayanahalli wrote:
 Probing RIP-relative instructions work just fine; there are fixups that
 take care of it. 

Ah my bad then, it was my understanding you simply bailed on those.

Just for my information, how large are the replacement sequences?



Re: [RFC] [PATCH 0/7] UBP, XOL and Uprobes [ Summary of Comments and actions to be taken ]

2010-01-22 Thread Peter Zijlstra
On Fri, 2010-01-22 at 12:54 +0530, Ananth N Mavinakayanahalli wrote:
 On Fri, Jan 22, 2010 at 12:32:32PM +0530, Srikar Dronamraju wrote:
  Here is a summary of the Comments and actions that need to be taken for
  the current uprobes patchset. Please let me know if I missed or
  misunderstood any of your comments.  
  
  1. Uprobes depends on trap signal.
  Uprobes depends on trap signal rather than hooking to the global
  die notifier. It was suggested that we hook to the global die notifier.
  
  In the next version of patches, Uprobes will use the global die
  notifier and look at the per-task count of the probes in use to
  see if it has to be consumed.
  
  However this would reduce the ability of uprobe handlers to
  sleep. Since we are dealing with userspace, sleeping in handlers
  would have been a good feature. We are looking at ways to get
  around this limitation.
 
 We could set a TIF_ flag in the notifier to indicate a breakpoint hit
 and process it in task context before the task heads into userspace.

Make that optional, not everybody might want that. Either provide a
simple trampoline or use a flag to indicate the callback be called from
process context on registration.



Re: linux-next: add utrace tree

2010-01-22 Thread Peter Zijlstra
On Fri, 2010-01-22 at 15:01 -0500, Frank Ch. Eigler wrote:
 So then there's uprobes, which is another potential utrace killer
 app

That's bollocks, uprobes is an utter and total mis-match for utrace.
Probing userspace is primarily about DSOs which is files and vma's, not
tasks.

You might maybe want a utrace interface to that, but that is largely
non-interesting.

IOW, we don't need utrace to make sensible use of uprobes.

(And when I speak of uprobes I mean the thing formerly called UBP)



Re: [RFC] [PATCH 1/7] User Space Breakpoint Assistance Layer (UBP)

2010-01-20 Thread Peter Zijlstra
On Wed, 2010-01-20 at 11:43 +0200, Avi Kivity wrote:
 1. Write a trace entry into shared memory, trap into the kernel on overflow.
 2. Trap if a condition is satisfied (fast watchpoint implementation).

So now you want to consume more of a process' address space to store
trace data as well? Not to mention that that process could wreck the
trace data rendering it utterly unreliable.



Re: [RFC] [PATCH 1/7] User Space Breakpoint Assistance Layer (UBP)

2010-01-18 Thread Peter Zijlstra
On Mon, 2010-01-18 at 13:01 +0200, Avi Kivity wrote:
 
 You've made it clear that you don't like it, but not why.
 
 The kernel already manages the user's address space (except for 
 MAP_FIXED which is unreliable unless you've already reserved the address 
 space).  I don't see why adding a vma for debugging is so horrible. 

Well, the kernel only does what the user (and loader) tell it through
mmap(). Other than that we never (except this VDSO thing) inject vmas,
and I see no reason to start doing that now.



Re: [RFC] [PATCH 1/7] User Space Breakpoint Assistance Layer (UBP)

2010-01-18 Thread Peter Zijlstra
On Mon, 2010-01-18 at 13:01 +0200, Avi Kivity wrote:
 If we reserve some address space, you don't add any heisenbugs (at 
 least, not any additional ones over emulation).  Even if we don't, 
 address space layout randomization means we're not keeping the address 
 space layout constant between runs anyway. 

Well, it still limits the number of probes to the reserved area. If you
want more you need to grow the area.. which then changes the state.



Re: [RFC] [PATCH 1/7] User Space Breakpoint Assistance Layer (UBP)

2010-01-18 Thread Peter Zijlstra
On Mon, 2010-01-18 at 14:01 +0200, Avi Kivity wrote:
 
 Maybe you place no value on uprobes.  But people who debug userspace 
 likely will see a reason.

I do see value in uprobes, I just don't like it mucking about with the
address space. Nor does it appear required. 



Re: [RFC] [PATCH 1/7] User Space Breakpoint Assistance Layer (UBP)

2010-01-18 Thread Peter Zijlstra
On Mon, 2010-01-18 at 14:17 +0200, Avi Kivity wrote:
 On 01/18/2010 02:13 PM, Pekka Enberg wrote:
  So how big chunks of the address space are we talking here for uprobes?
 
 
 That's for the authors to answer, but at a guess, 32 bytes per probe 
 (largest x86 instruction is 15 bytes), so 32 MB will give you a million 
 probes.  That's a piece of cake for x86-64, probably harder to justify 
 for i386.

Yeah, I'm aware of people turning off address space randomization to
gain more virtual space on i386, I'm pretty sure those folks aren't
going to be happy if we shrink it.

Let alone them trying to probe their app.



Re: [RFC] [PATCH 1/7] User Space Breakpoint Assistance Layer (UBP)

2010-01-18 Thread Peter Zijlstra
On Mon, 2010-01-18 at 14:37 +0200, Avi Kivity wrote:
 On 01/18/2010 02:14 PM, Peter Zijlstra wrote:
 
  Well, the alternatives are very unappealing.  Emulation and
  single-stepping are going to be very slow compared to a couple of jumps.
   
  With CPL2 or RPL on user segments the protection issue seems to be
  manageable for running the instructions from kernel space.
 
 
 CPL2 gives unrestricted access to the kernel address space; and RPL does 
 not affect page level protection.  Segment limits don't work on x86-64.  
 But perhaps I missed something - these things are tricky.

So setting RPL to 3 on the user segments allows access to kernel pages
just fine? How useful.. :/

 It should be possible to translate the instruction into an address space 
 check, followed by the action, but that's still slower due to privilege 
 level switches.

Well, if you manage to do the address validation you don't need the priv
level switch anymore, right?

Are the ins encodings sane enough to recognize mem parameters without
needing to know the actual ins?

How about using a hw-breakpoint to close the gap for the inline single
step? You could even re-insert the int3 lazily when you need the
hw-breakpoint again. It would consume one hw-breakpoint register for
each task/cpu that has probes though..



Re: [RFC] [PATCH 1/7] User Space Breakpoint Assistance Layer (UBP)

2010-01-17 Thread Peter Zijlstra
On Sun, 2010-01-17 at 16:56 +0200, Avi Kivity wrote:
 On 01/17/2010 04:52 PM, Peter Zijlstra wrote:

  Also, if its fixed size you're imposing artificial limits on the number
  of possible probes.
 
 
 Obviously we'll need a limit, a uprobe will also take kernel memory, we 
 can't allow people to exhaust it.

Only if its unprivilidged, kernel and root should be able to place as
many probes until the machine keels over.



Re: [RFC] [PATCH 1/7] User Space Breakpoint Assistance Layer (UBP)

2010-01-17 Thread Peter Zijlstra
On Sun, 2010-01-17 at 16:59 +0200, Avi Kivity wrote:
 On 01/17/2010 04:52 PM, Peter Zijlstra wrote:
  On Sun, 2010-01-17 at 16:39 +0200, Avi Kivity wrote:
 
  On 01/15/2010 11:50 AM, Peter Zijlstra wrote:
   
  As previously stated, I think poking at a process's address space is an
  utter no-go.
 
 
  Why not reserve an address space range for this, somewhere near the top
  of memory?  It doesn't have to be populated if it isn't used.
   
  Because I think poking at a process's address space like that is gross.
  Also, if its fixed size you're imposing artificial limits on the number
  of possible probes.
 
 
 btw, an alternative is to require the caller to provide the address 
 space for this.  If the caller is in another process, we need to allow 
 it to play with the target's address space (i.e. mmap_process()).  I 
 don't think uprobes justifies this by itself, but mmap_process() can be 
 very useful for sandboxing with seccomp.

mmap_process() sounds utterly gross, one process playing with another
process's address space.. yuck!



Re: [RFC] [PATCH 1/7] User Space Breakpoint Assistance Layer (UBP)

2010-01-17 Thread Peter Zijlstra
On Sat, 2010-01-16 at 18:48 -0500, Jim Keniston wrote:

 As you may have noted before, I think FP would be a special problem  
 for your approach.  I'm not sure how folks would react to the idea of  
 executing FP instructions in kernel space.  But emulating them is also  
 tough.  There's an IEEE FP emulation package somewhere in one of the  
 Linux arch directories, but I'm not sure how precise it is, and  
 dropping even 1 bit of precision is unacceptable for many  
 applications, since such errors tend to grow in complex computations  
 employing many FP instructions.

Well, we have kernel space using FP/MMX/SSE like things, its not hard if
you really need it, but in this case I think its easier than normal,
because we'll just allow it to change the userspace state because that
is exactly what we want it to do.



Re: [RFC] [PATCH 1/7] User Space Breakpoint Assistance Layer (UBP)

2010-01-17 Thread Peter Zijlstra
On Sat, 2010-01-16 at 19:12 -0500, Bryan Donlan wrote:
 On Fri, Jan 15, 2010 at 7:58 PM, Jim Keniston jkeni...@us.ibm.com wrote:
 
  4. Emulation removes the need for the XOL area, but requires pretty much
  total knowledge of the instruction set.  It's also a performance win for
  architectures that can't do #3.  I see kvm implemented on 4
  architectures (ia64, powerpc, s390, x86).  Coincidentally, those are the
  architectures to which uprobes (old uprobes, with ubp and xol bundled
  in) has already been ported (though Intel hasn't been maintaining their
  ia64 port).  So it sort of comes down to how objectionable the XOL vma
  (or page) really is.
 
 On x86 at least, wouldn't one option to be to run the instruction to
 be emulated in CPL ('ring') 2, from a XOL page above the user-kernel
 split, not accessible to userspace at CPL 3? Linux hasn't
 traditionally used anything other than CPL 0 and CPL 3 (plus CPL 1 on
 Xen), but it would seem to avoid many of the problems here - it's
 invisible to normal userspace code and so doesn't pollute userspace
 memory maps with kernel-private stuff, but since it's running at a
 higher CPL than the kernel, we can still protect kernel memory and
 protect against privileged instructions.

Another option is to go play games with the RPL of the user data
segments when we load them. But yeah, something like this seems to
nicely deal with the protection issues.



Re: [RFC] [PATCH 1/7] User Space Breakpoint Assistance Layer (UBP)

2010-01-17 Thread Peter Zijlstra
On Sun, 2010-01-17 at 21:33 +0200, Avi Kivity wrote:
 On 01/17/2010 05:03 PM, Peter Zijlstra wrote:
 
  btw, an alternative is to require the caller to provide the address
  space for this.  If the caller is in another process, we need to allow
  it to play with the target's address space (i.e. mmap_process()).  I
  don't think uprobes justifies this by itself, but mmap_process() can be
  very useful for sandboxing with seccomp.
   
  mmap_process() sounds utterly gross, one process playing with another
  process's address space.. yuck!
 
 
 This is debugging.  We're playing with registers, we're playing with the 
 cpu, we're playing with memory contents.  Why not the address space as well?

Because you want thins go to be as transparent as possible in order to
avoid heisenbugs. Sure we cannot avoid everything, but we should avoid
everything we possibly can.

Also, aside of the VDSO, we simply do not force map things into address
spaces (and like said before, I think the VDSO stinks for doing that)
and I think we don't want to create (more) precedents in this case.





Re: [RFC] [PATCH 1/7] User Space Breakpoint Assistance Layer (UBP)

2010-01-15 Thread Peter Zijlstra
On Thu, 2010-01-14 at 11:46 -0800, Jim Keniston wrote:
 
  discussed elsewhere. 

Thanks for the pointer...



Re: [RFC] [PATCH 4/7] Uprobes Implementation

2010-01-15 Thread Peter Zijlstra
On Thu, 2010-01-14 at 14:49 -0800, Jim Keniston wrote:
 On Thu, 2010-01-14 at 12:09 +0100, Peter Zijlstra wrote:
  On Mon, 2010-01-11 at 17:55 +0530, Srikar Dronamraju wrote:
   
   Uprobes Infrastructure enables user to dynamically establish
   probepoints in user applications and collect information by executing
   a handler functions when the probepoints are hit.
   Please refer Documentation/uprobes.txt for more details.
   
   This patch provides the core implementation of uprobes.
   This patch builds on utrace infrastructure.
   
   You need to follow this up with the uprobes patch for your
   architecture. 
  
  So all this is basically some glue between what you call ubp (the real
  userspace breakpoint stuff) and utrace? Or does it do more?
  
 
 My reply in
 http://lkml.indiana.edu/hypermail/linux/kernel/1001.1/02483.html
 addresses this.

Right, so all that need be done is add the multiple probe stuff to UBP
and its a sane interface to use on its own, at which point I'd be
inclined to call that uprobes (UBP really is an crap name).

Then we can ditch the whole utrace muck as I see no reason to want to
use that, whereas the ubp (given a sane name) looks interesting.



Re: [RFC] [PATCH 4/7] Uprobes Implementation

2010-01-15 Thread Peter Zijlstra
On Fri, 2010-01-15 at 04:26 -0500, Frank Ch. Eigler wrote:
 Peter Zijlstra pet...@infradead.org writes:
 
  [...]
  Right, so all that need be done is add the multiple probe stuff to UBP
  and its a sane interface to use on its own, at which point I'd be
  inclined to call that uprobes (UBP really is an crap name).
 
 At one point ubp+uprobes were one piece.  They were separated on the
 suspicion that lkml would like them that way.

Right, good thinking, that way we can use ubp without having to use
utrace ;-)

  Then we can ditch the whole utrace muck as I see no reason to want to
  use that, whereas the ubp (given a sane name) looks interesting.
 
 Assuming you meant what you write, perhaps you misunderstand the
 layering relationship of these pieces.  utrace underlies uprobes and
 other process manipulation functionality, present and future.

Why, utrace doesn't at all look to bring a fundamental contribution to
all that. If there's a proper kernel interface to install probes on
userspace code (ubp seems to be mostly that) then I can use perf/ftrace
to do the rest of the state management, no need to use utrace there.

You can hardly force me to use utrace there, can you?



Re: [RFC] [PATCH 1/7] User Space Breakpoint Assistance Layer (UBP)

2010-01-15 Thread Peter Zijlstra
On Fri, 2010-01-15 at 15:08 +0530, Ananth N Mavinakayanahalli wrote:
 On Fri, Jan 15, 2010 at 10:03:48AM +0100, Peter Zijlstra wrote:
  On Thu, 2010-01-14 at 11:46 -0800, Jim Keniston wrote:
   
discussed elsewhere. 
  
  Thanks for the pointer...
 
 :-)
 
 Peter,
 I think Jim was referring to
 http://sources.redhat.com/ml/systemtap/2007-q1/msg00571.html

That's a 2007 email from some obscure list... that's hardly something
that can be referenced to without link.

As previously stated, I think poking at a process's address space is an
utter no-go.



Re: [RFC] [PATCH 1/7] User Space Breakpoint Assistance Layer (UBP)

2010-01-15 Thread Peter Zijlstra
On Fri, 2010-01-15 at 15:40 +0530, Ananth N Mavinakayanahalli wrote:

 Ideas?

emulate the one instruction?



Re: [RFC] [PATCH 4/7] Uprobes Implementation

2010-01-15 Thread Peter Zijlstra
On Fri, 2010-01-15 at 15:56 +0530, Srikar Dronamraju wrote:
 Hi Peter, 
 
 Or there could be two threads that could be racing to
 insert/delete a breakpoint. These synchronization issues are all handled
 by the Uprobes layer.

Shouldn't be hard to put that in the ubp layer, right?

 Uprobes layer would need to be notified of process life-time events
 like fork/clone/exec/exit. 

No so much the process lifetimes as the vma life times are interesting,
placing a hook in the vm code to track that isn't too hard, 

 It also needs to know
   -  when a breakpoint is hit
   -  stop and resume a thread.

A simple hook in the trap code is done quickly enough, and no reason to
stop the thread, its not going anywhere when it traps.




Re: [RFC] [PATCH 1/7] User Space Breakpoint Assistance Layer (UBP)

2010-01-15 Thread Peter Zijlstra
On Fri, 2010-01-15 at 15:52 +0530, Ananth N Mavinakayanahalli wrote:
 On Fri, Jan 15, 2010 at 11:13:32AM +0100, Peter Zijlstra wrote:
  On Fri, 2010-01-15 at 15:40 +0530, Ananth N Mavinakayanahalli wrote:
  
   Ideas?
  
  emulate the one instruction?
 
 In kernel? Generically? Don't think its that easy for userspace --
 you have the full gamut of instructions to emulate (fp, vector, etc);
 further,

Can't you jit a piece of code that wraps the one instruction, save the
full cpu state, set the userspace segments, have it load pt_regs (except
for the IP) execute the one ins, save the results, restore the full
state?

Then replace pt_regs with the saved result and advance the stored IP by
the length of that one instruction and return to userspace?

All you need to take care of are the priv insns, but doesn't something
like kvm already have code to deal with that?

  the instruction could itself cause a page fault and the like.

Faults aren't a problem, we take faults from kernel space all the time.



Re: [RFC] [PATCH 4/7] Uprobes Implementation

2010-01-15 Thread Peter Zijlstra
On Fri, 2010-01-15 at 16:35 +0530, Maneesh Soni wrote:
 On Fri, Jan 15, 2010 at 11:33:27AM +0100, Peter Zijlstra wrote:
  On Fri, 2010-01-15 at 15:56 +0530, Srikar Dronamraju wrote:
   Hi Peter, 
   
   Or there could be two threads that could be racing to
   insert/delete a breakpoint. These synchronization issues are all handled
   by the Uprobes layer.
  
  Shouldn't be hard to put that in the ubp layer, right?
  
   Uprobes layer would need to be notified of process life-time events
   like fork/clone/exec/exit. 
  
  No so much the process lifetimes as the vma life times are interesting,
  placing a hook in the vm code to track that isn't too hard, 
  
 
 I think similar hooks were given thumbs down in the previous incarnation
 of uprobes (which was implemented without utrace).  
 
 http://lkml.indiana.edu/hypermail/linux/kernel/0603.2/1254.html

I wasn't at all proposing to mess with a_ops, nor do you really need to,
I was more thinking of adding a callback like perf_event_mmap() and a
corresponding unmap(), that way you can track mapping life-times and
add/remove probes accordingly.

Adding the probe uses the fact that (most) executable mappings are
MAP_PRIVATE and CoWs a private copy of the page with the modified ins,
right?

What does it do for MAP_SHARED|MAP_EXECUTABLE sections -- simply fail to
add the probe?



Re: [RFC] [PATCH 4/7] Uprobes Implementation

2010-01-15 Thread Peter Zijlstra
On Fri, 2010-01-15 at 12:12 +0100, Peter Zijlstra wrote:
 On Fri, 2010-01-15 at 16:35 +0530, Maneesh Soni wrote:
  On Fri, Jan 15, 2010 at 11:33:27AM +0100, Peter Zijlstra wrote:
   On Fri, 2010-01-15 at 15:56 +0530, Srikar Dronamraju wrote:
Hi Peter, 

Or there could be two threads that could be racing to
insert/delete a breakpoint. These synchronization issues are all handled
by the Uprobes layer.
   
   Shouldn't be hard to put that in the ubp layer, right?
   
Uprobes layer would need to be notified of process life-time events
like fork/clone/exec/exit. 
   
   No so much the process lifetimes as the vma life times are interesting,
   placing a hook in the vm code to track that isn't too hard, 
   
  
  I think similar hooks were given thumbs down in the previous incarnation
  of uprobes (which was implemented without utrace).  
  
  http://lkml.indiana.edu/hypermail/linux/kernel/0603.2/1254.html
 
 I wasn't at all proposing to mess with a_ops, nor do you really need to,
 I was more thinking of adding a callback like perf_event_mmap() and a
 corresponding unmap(), that way you can track mapping life-times and
 add/remove probes accordingly.
 
 Adding the probe uses the fact that (most) executable mappings are
 MAP_PRIVATE and CoWs a private copy of the page with the modified ins,
 right?

Does it clean up the CoW'ed page on removing the probe? Does that
account for userspace having made other changes in between installing
and removing the probe (for PROT_WRITE mappings obviously)?



Re: [RFC] [PATCH 4/7] Uprobes Implementation

2010-01-15 Thread Peter Zijlstra
On Fri, 2010-01-15 at 19:50 +0530, Srikar Dronamraju wrote:

  Srikar seemed to suggest it needed stop/resume.
  
 
 If process traps, We dont need to stop/resume other threads.
 uprobes needs threads to quiesce when inserting/deleting the breakpoint.

Right, I don't think we need to at all. See the CoW thing from previous
emails.



Re: [RFC] [PATCH 4/7] Uprobes Implementation

2010-01-15 Thread Peter Zijlstra
On Fri, 2010-01-15 at 09:22 -0500, Frank Ch. Eigler wrote:
 Hi -
 
   Well, I'm not in a position to argue line by line about the necessity
   or the cost of utrace low level guts, but this may represent the most
   practical engineering balance between functionality / modularity /
   undesirably intrusive modifications. 
  
  How intrusive and non-modular is installing a DIE_INT3 notifier?
 
 I'm not sure about all the reasons pro/con, but it looks like
 installing such a systemwide hook would force every userspace
 breakpoint or kprobe event machine wide to pass through the
 hypothetical uprobes layer, whether or not applicable to a current
 task.

Well, we'll have to pass through the global die notifier anyway, but a
quick per task filter sounds like a good idea, we can do that by keeping
a per-task count of the number of uprobes in use.

Then the uprobe code can avoid the lookup if there are no task users and
no global users.

The advantage of this construct is that is easily allows for global
users, whereas a utrace based one doesn't.





Re: [RFC] [PATCH 1/7] User Space Breakpoint Assistance Layer (UBP)

2010-01-15 Thread Peter Zijlstra
On Fri, 2010-01-15 at 13:07 -0800, Jim Keniston wrote:
 On Fri, 2010-01-15 at 10:02 +0100, Peter Zijlstra wrote:
  On Thu, 2010-01-14 at 11:46 -0800, Jim Keniston wrote:
   
   +Instruction copies to be single-stepped are stored in a per-process
   +single-step out of line (XOL) area, which is a little VM area
   +created by Uprobes in each probed process's address space.
  
  I think tinkering with the probed process's address space is a no-no.
  Have you ran this by the linux mm folks?
 
 Sort of.
 
 Back in 2007 (!), we were getting ready to post uprobes (which was then
 essentially uprobes+xol+upb) to LKML, pondering XOL alternatives and
 waiting for utrace to get pulled back into the -mm tree.  (It turned out
 to be a long wait.)  I emailed Andrew Morton, inquiring about the
 prospects for utrace and giving him a preview of utrace-based uprobes.
 He expressed openness to the idea of allocating a piece of the user
 address space for the XOL area, a la the vdso page.
 
 With advice and review from Dave Hansen, we implemented an XOL page, set
 up for every process (probed or not) along the same lines as the vdso
 page.
 
 About that time, Roland McGrath suggested using do_mmap_pgoff() to
 create a separate vma on demand.  This was the seed of the current
 implementation.  It had the advantages of being
 architecture-independent, affecting only probed processes, and allowing
 the allocation of more XOL slots.  (Uprobes can make do with a fixed
 number of XOL slots -- allowing one probepoint to steal another's slot
 -- but it isn't pretty.)
 
 As I recall, Dave preferred the other idea (1 XOL page for every
 process, probed or not) -- mostly because he didn't like the idea of a
 new vma popping into existence when the process gets probed -- but was
 OK with us going ahead with Roland's idea.

Well, I think its all very gross, I would really like people to try and
'emulate' or plain execute those original instructions from kernel
space.

As to the privileged instructions, I think qemu/kvm like projects should
have pretty much all of that covered.

Nor do I think we need utrace at all to make user space probes useful.
Even stronger, I think the focus on utrace made you get some
fundamentals wrong. Its not mainly about task state, but like said, its
about text mappings, which is something utrace knows nothing about.

That is not to say you cannot build a useful interface from uprobes and
utrace, but its not at all required or natural.



Re: [RFC] [PATCH 1/7] User Space Breakpoint Assistance Layer (UBP)

2010-01-14 Thread Peter Zijlstra
On Mon, 2010-01-11 at 17:55 +0530, Srikar Dronamraju wrote:
 User Space Breakpoint Assistance Layer (UBP)
 
 User space breakpointing Infrastructure provides kernel subsystems
 with architecture independent interface to establish breakpoints in
 user applications. This patch provides core implementation of ubp and
 also wrappers for architecture dependent methods.

So if this is the basic infrastructure to set userspace breakpoints,
then why not call this uprobe?

 UBP currently supports both single stepping inline and execution out
 of line strategies. Two different probepoints in the same process can
 have two different strategies.

maybe explain wth these are?



Re: [RFC] [PATCH 4/7] Uprobes Implementation

2010-01-14 Thread Peter Zijlstra
On Mon, 2010-01-11 at 17:55 +0530, Srikar Dronamraju wrote:
 
 Uprobes Infrastructure enables user to dynamically establish
 probepoints in user applications and collect information by executing
 a handler functions when the probepoints are hit.
 Please refer Documentation/uprobes.txt for more details.
 
 This patch provides the core implementation of uprobes.
 This patch builds on utrace infrastructure.
 
 You need to follow this up with the uprobes patch for your
 architecture. 

So all this is basically some glue between what you call ubp (the real
userspace breakpoint stuff) and utrace? Or does it do more?



Re: [RFC] [PATCH 4/7] Uprobes Implementation

2010-01-14 Thread Peter Zijlstra
On Tue, 2010-01-12 at 13:44 +0530, Ananth N Mavinakayanahalli wrote:
 
 Well, I wonder if perf can ride on utrace's callbacks for the
 hook_task() for the clone/fork cases? 

Well it could, but using all of utrace to simply get simple callbacks
from copy_process() is just daft so we're not going to do that.



Re: [RFC] [PATCH 3/7] Execution out of line (XOL)

2010-01-14 Thread Peter Zijlstra
On Mon, 2010-01-11 at 17:55 +0530, Srikar Dronamraju wrote:
 Execution out of line (XOL)
 
 Slot allocation mechanism for Execution Out of Line strategy in User
 space breakpointing Inftrastructure. (XOL)
 
 This patch provides slot allocation mechanism for execution out of
 line strategy for use with user space breakpoint infrastructure.
 This patch requires utrace support in kernel.
 
 This patch provides five functions xol_get_insn_slot(),
 xol_free_insn_slot(), xol_put_area(), xol_get_area() and
 xol_validate_vaddr().
 
 Current slot allocation mechanism:
 1. Allocate one dedicated slot per user breakpoint.
 2. If the allocated vma is completely used, expand current vma.
 3. If we cant expand the vma, allocate a new vma.


Say what?

I see the text, but non of it makes any sense at all.



Re: [RFC] [PATCH 7/7] Ftrace plugin for Uprobes

2010-01-14 Thread Peter Zijlstra
On Thu, 2010-01-14 at 12:23 +0100, Peter Zijlstra wrote:
 On Mon, 2010-01-11 at 17:56 +0530, Srikar Dronamraju wrote:
  This patch implements ftrace plugin for uprobes.
 
 Right, like others have said, trace events is a much saner interface.
 
 So the easiest way I can see that working is to register uprobes against
 a file (not a pid).

Just to clarify, this means you can do things like:

 p:uprobe_event dso:symbol[+offs]

Irrespective of whether there are any current user of that file.



Re: [RFC] [PATCH 7/7] Ftrace plugin for Uprobes

2010-01-14 Thread Peter Zijlstra
On Thu, 2010-01-14 at 12:35 +0100, Frederic Weisbecker wrote:
 On Thu, Jan 14, 2010 at 12:23:11PM +0100, Peter Zijlstra wrote:
  On Mon, 2010-01-11 at 17:56 +0530, Srikar Dronamraju wrote:
   This patch implements ftrace plugin for uprobes.
  
  Right, like others have said, trace events is a much saner interface.
  
  So the easiest way I can see that working is to register uprobes against
  a file (not a pid). Then on creation it uses rmap to find all current
  maps of that file and install the probe if there is a consumer for that
  map.
  
  Then for each new mmap() of that file, we also need to check if there's
  a consumer ready and install the probe.
 
 
 
 That looks racy.
 
 Say you first create a probe on /bin/ls:
 
 perf probe p addr_in_ls /bin/ls
 
 then something else launches /bin/ls behind you, probe
 is set on it
 
 then you launch:
 
 perf record -e probe: /bin/ls
 
 Then it goes recording the previous instance.

Uhm, why? Only the perf /bin/ls instance has a consumer and will thus
have a probe installed.

(Or if you want to use ftrace you need to always have all instances
probed anyway)



Re: [RFC] [PATCH 7/7] Ftrace plugin for Uprobes

2010-01-14 Thread Peter Zijlstra
On Thu, 2010-01-14 at 13:16 +0100, Mark Wielaard wrote:
 On Thu, 2010-01-14 at 12:29 +0100, Peter Zijlstra wrote:
  On Thu, 2010-01-14 at 12:23 +0100, Peter Zijlstra wrote:
   On Mon, 2010-01-11 at 17:56 +0530, Srikar Dronamraju wrote:
This patch implements ftrace plugin for uprobes.
   
   Right, like others have said, trace events is a much saner interface.
   
   So the easiest way I can see that working is to register uprobes against
   a file (not a pid).
  
  Just to clarify, this means you can do things like:
  
   p:uprobe_event dso:symbol[+offs]
  
  Irrespective of whether there are any current user of that file.
 
 Yes, that is a good idea, you can then also refine that with a filter on
 a target pid. That is what systemtap also does, you define files
 (whether they are executables or shared libraries, etc) plus
 symbols/offsets/etc as targets and monitor when they get mapped in
 (either system wide, per executable or pid based).

Well, the pid part is more the concern of the consumer of the
trace-event. The event itself is task invariant and only cares about the
particular code in question getting executed.



Re: [RFC,PATCH 14/14] utrace core

2009-12-14 Thread Peter Zijlstra
On Sun, 2009-12-13 at 16:25 -0800, Roland McGrath wrote:
 I'm sorry for the delay.  I'm picking up with responding to the parts of
 your review that I did not include in my first reply.  I appreciate very
 much the discussion you've had with Oleg about the issues that I did not
 address myself.  I look forward to your replies to my comments in that
 first reply, which we have yet to see.

Yeah, no worries, I'm not too quick these days myself..

  Seems inconsistent on u32 vs enum utrace_resume_action.
  
  Why force enum utrace_resume_action into a u32?
 
 The first argument to almost all callbacks (all the ones made when the
 task is alive) is called action and it's consistent that its low bits
 contain an enum utrace_resume_action.  The argument is u32 action in
 the report_signal and report_syscall_entry callbacks, where it combines
 an enum utrace_resume_action with an enum utrace_{signal,syscall}_action
 (respectively).  It would indeed be more consistent to use u32 action
 throughout, but it seemed nicer not to have engine writers always
 writing utrace_resume_action(action) for all the cases where there are
 no other bits in there.

C does implicit casts from enum to integer types, right? So always using
u32 here would not impose any extra typing on the user, or am I missing
something subtle here?

 The return value is a mirror of the u32 action argument.  In many
 calls, it has only an enum utrace_resume_action in it.  But in some it
 combines that with another enum utrace_*_action.  There I went for
 consistency (and less typing) in the return type, since it doesn't make
 any difference to how you have to write the code in your callback
 function.  The main reason I used u32 at all instead of unsigned int
 is just its shortness for less typing.
 
 I don't really mind changing these details to whatever people think is
 best.  The other people writing code to use the utrace API may have more
 opinions than I do.  I guess it could even be OK to make the return
 value and action argument always a plain enum utrace_resume_action,
 and use a second in/out enum utrace_{signal,syscall}_action *
 parameter in those particular calls.  But that does put some more
 register pressure and loads/stores into this API.

I don't mind the sharing of the argument, it just looked odd to have the
u32/unsigned int/enum thing intermixed, since you care about typing
length (as good a criteria as any) I'd just be lazy and make everything
u32 ;-)

  Quite gross this.. can't we key off the
  tracehoook_report_clone_complete() and use a wakeup there?
 
 Yes, we intended to clean this up at some point later.  But doing that
 is just a distraction and complication right now so we put it off.  For
 this case, however, I suppose we could just punt for the initial version.
 
 This code exists to support the special semantics of calling
 utrace_attach_task() from inside the parent's report_clone() callback.
 That guarantees the parent that it wins any race with any third thread
 calling utrace_attach_task().  

 This guarantees it will be first attacher
 in the callback order, but the general subject of callback order is
 already something we think we will want to revisit in the future after
 we have more experience with lots of different engines trying to work
 together.  

 It's most meaningful when using the UTRACE_ATTACH_EXCLUSIVE
 flag -- then you can use UTRACE_ATTACH_EXCLUSIVE|UTRACE_ATTACH_MATCH_OPS
 to synchronize with other threads trying to attach the same kind of
 engine to a task, and give special priority in that to the engine that
 attaches in report_clone() from tracing the parent.

I broke the text so it reads easier for me, it might be me, it might not
be proper English -- I'm not a native speaker -- but this is an example
of what you asked for below.

The thing is, your sentences are rather long, with lots of sub-parts and
similar. I find I need a break after digesting a few such things.

As to the content, can't you accomplish the same thing by processing
such exclusive parent registration before exposing the child in the
pid-hash? Right before cgroup_fork_callback() in
kernel/fork.c:copy_process() seems like the ideal site.

 Really I came up with this special semantics originally just with ptrace
 in mind.  ptrace is an engine that wants to exclude other tracer threads
 attaching asynchronously with the same kind of engine, and that wants to
 give special priority on a child to the parent's tracer (to implement
 PTRACE_O_TRACECLONE et al).  However, Oleg's current ptrace code still
 relies on the old hard-coded locking kludges to exclude the separate
 attachers and to magically privilege the auto-attach from the parent's
 tracer.  So we are not relying on this special semantics yet.
 
 We could just punt utrace_attach_delay() and remove the associated
 documentation about the special rights of report_clone() calling
 utrace_attach_task().  If we decide it helps clean things up later when
 we finish more 

Re: [RFC,PATCH 14/14] utrace core

2009-12-08 Thread Peter Zijlstra
On Tue, 2009-12-08 at 16:04 +0100, Oleg Nesterov wrote:
 The
 problem is, this code was developed out-of-tree. That is why we would
 like to merge it asap, then do other changes which could be easily
 reviewed.
 
 Now, do you really mean we should throw out the working code, rewrite
 it avoiding these barriers, and resubmit? Sure, everything is possible.
 But this means another round of out-of-tree development with unclear
 results. 

Out-of-tree development is bad, it having taken lot of effort is no
excuse for merging ugly.

Now, I'm not against barriers at all, but code that is as barrier heavy
as this, with such bad comments and no clear indication it was actually
worth using so many barriers make me wonder.

Barriers aren't free either, and having multiple such things in quick
succession isn't nessecarily faster than a lock, but much less obvious.





Re: [RFC,PATCH 14/14] utrace core

2009-12-08 Thread Peter Zijlstra
On Tue, 2009-12-08 at 17:31 +0100, Oleg Nesterov wrote:

  If you take a task ref you can write the much saner:
 
  utrace_control()
  {
...
spin_lock(utrace-lock);
...
if (reset)
  utrace_reset(utrace);
 
spin_unlock(utrace-lock);
  }
 
 No, get_task_struct() in utrace_reset() can't help, we should move
 it into utrace_control() then. And in this case it becomes even more
 subtle: it is needed because -utrace_flags may be cleared inside
 utrace_reset() and after that utrace_control()-spin_unlock() becomes
 unsafe.

The task-utrace pointer is cleaned up on
free_task()-tracehook_free_task()-utrace_free_task(), so by holding a
ref on the task, we ensure -utrace stays around, and we can do
spin_unlock(), right?

 Also. utrace_reset() drops utrace-lock to call put_detached_list()
 lockless. If we want to avoid the assymetric locking, every caller
 should pass struct list_head *detached to utrace_reset(), drop
 utrace-lock, and call put_detached_list().

All that seems to do is call -release() and kmem_cache_free()s the
utrace_engine thing, why can't that be done with utrace-lock held?

But yeah, passing that list along does seem like a better solution.



Re: [RFC,PATCH 14/14] utrace core

2009-12-07 Thread Peter Zijlstra
On Tue, 2009-12-01 at 23:08 +0100, Oleg Nesterov wrote:

   @@ -560,6 +625,20 @@ static inline void tracehook_report_deat
   int signal, void *death_cookie,
   int group_dead)
{
   + /*
   +  * This barrier ensures that our caller's setting of
   +  * @task-exit_state precedes checking @task-utrace_flags here.
   +  * If utrace_set_events() was just called to enable
   +  * UTRACE_EVENT(DEATH), then we are obliged to call
   +  * utrace_report_death() and not miss it.  utrace_set_events()
   +  * uses tasklist_lock to synchronize enabling the bit with the
   +  * actual change to @task-exit_state, but we need this barrier
   +  * to be sure we see a flags change made just before our caller
   +  * took the tasklist_lock.
   +  */
   + smp_mb();
   + if (task_utrace_flags(task)  _UTRACE_DEATH_EVENTS)
   + utrace_report_death(task, death_cookie, group_dead, signal);
}
 
  I don't think its allowed to pair a mb with a lock-barrier, since the
  lock barriers are semi-permeable.
 
 Could you clarify?

According to memory-barriers.txt a mb can be paired with mb,wmb,rmb
depending on the situation.

For LOCK it states:

(1) LOCK operation implication:

 Memory operations issued after the LOCK will be completed after the LOCK
 operation has completed.

 Memory operations issued before the LOCK may be completed after the LOCK
 operation has completed.

Which is not something I can see making sense to pair with an mb.

So either the comment is confusing and its again referring to an UNLOCK
+LOCK pair, or there's something fishy

   @@ -589,10 +668,20 @@ static inline void set_notify_resume(str
 * asynchronously, this will be called again before we return to
 * user mode.
 *
   - * Called without locks.
   + * Called without locks.  However, on some machines this may be
   + * called with interrupts disabled.
 */
static inline void tracehook_notify_resume(struct pt_regs *regs)
{
   + struct task_struct *task = current;
   + /*
   +  * This pairs with the barrier implicit in set_notify_resume().
   +  * It ensures that we read the nonzero utrace_flags set before
   +  * set_notify_resume() was called by utrace setup.
   +  */
   + smp_rmb();
   + if (task_utrace_flags(task))
   + utrace_resume(task, regs);
}
 
  Sending an IPI implies the mb?
 
 Yes, but this has nothing to do with IPI. The caller, do_notify_resume(),
 does:
   clear_thread_flag(TIF_NOTIFY_RESUME);
   tracehook_notify_resume:
   if (task_utrace_flags(task))
   utrace_resume();
 
 We should not read task_utrace_flags() before we clear TIF_NOTIFY_RESUME.

Then this comment needs an update.. that wasn't at all clear to me.

   +static inline struct utrace *task_utrace_struct(struct task_struct *task)
   +{
   + struct utrace *utrace;
   +
   + /*
   +  * This barrier ensures that any prior load of task-utrace_flags
   +  * is ordered before this load of task-utrace.  We use those
   +  * utrace_flags checks in the hot path to decide to call into
   +  * the utrace code.  The first attach installs task-utrace before
   +  * setting task-utrace_flags nonzero, with a barrier between.
   +  * See utrace_task_alloc().
   +  */
   + smp_rmb();
   + utrace = task-utrace;
   +
   + smp_read_barrier_depends(); /* See utrace_task_alloc().  */
   + return utrace;
   +}
 
  I spot two barriers here, but only 1 over in utrace_task_alloc(), hmm?
 
 smp_read_barrier_depends() pairs with utrace_task_alloc()-wmb().
 
 smp_rmb() is needed for another reason. Suppose the code does:
 
   if (task_utrace_flags()  SOMETHING)
   do_something_with(task-utrace);
 
 if we race with utrace_attach_task(), we can see -utrace_flags != 0
 but task-utrace == NULL without rmb().

Still not clear what we pair with.. There is no obvious barrier in
utrace_attach_task() nor a comment referring to this site.

   +struct utrace_engine {
   +/* private: */
   + struct kref kref;
   + void (*release)(void *);
   + struct list_head entry;
   +
   +/* public: */
   + const struct utrace_engine_ops *ops;
   + void *data;
   +
   + unsigned long flags;
   +};
 
  Sorry, the kernel is written in C, not C++.
 
 Hmm. I almost never read the comments, but these 2 look very clear
 to me ;)

I see no point in adding comments like that at all.

   + * Most callbacks take an @action argument, giving the resume action
   + * chosen by other tracing engines.  All callbacks take an @engine
   + * argument, and a @task argument, which is always equal to @current.
 
  Given that some functions have a lot of arguments (depleting regparam),
  isn't it more expensive to push current on the stack than it is to
  simply read it again?
 
 Yes, perhaps. Only -report_reap() really needs @task, it may be
 !current.
 
   +struct utrace_engine_ops {
 
   + u32 (*report_signal)(u32 action,
   +  struct 

Re: [RFC,PATCH 14/14] utrace core

2009-12-01 Thread Peter Zijlstra
On Tue, 2009-11-24 at 21:02 +0100, Oleg Nesterov wrote:

 +  sect2 id=reaptitleFinal callbacks/title
 +  para
 +The functionreport_reap/function callback is always the final event
 +in the life cycle of a traced thread.  Tracing engines can use this as
 +the trigger to clean up their own data structures.  The
 +functionreport_death/function callback is always the penultimate
 +event a tracing engine might see; it's seen unless the thread was
 +already in the midst of dying when the engine attached.  Many tracing
 +engines will have no interest in when a parent reaps a dead process,
 +and nothing they want to do with a zombie thread once it dies; for
 +them, the functionreport_death/function callback is the natural
 +place to clean up data structures and detach.  To facilitate writing
 +such engines robustly, given the asynchrony of
 +constantSIGKILL/constant, and without error-prone manual
 +implementation of synchronization schemes, the
 +applicationutrace/application infrastructure provides some special
 +guarantees about the functionreport_death/function and
 +functionreport_reap/function callbacks.  It still takes some care
 +to be sure your tracing engine is robust to tear-down races, but these
 +rules make it reasonably straightforward and concise to handle a lot of
 +corner cases correctly.
 +  /para
 +  /sect2

This above text seems inconsistent. First you say report_reap() is the
final callback and should be used for cleanup, then you say
report_death() is the penultimate callback but might not always happen
and people would normally use that as cleanup.

If we cannot rely on report_death() then I would suggest to simply
recommend report_reap() as cleanup callback and leave it at that.

 +  para
 +There is nothing a kernel module can do to keep a structnamestruct
 +task_struct/structname alive outside of
 +functionrcu_read_lock/function. 

Sure there is, get_task_struct() comes to mind.

  When the task dies and is reaped
 +by its parent (or itself), that structure can be freed so that any
 +dangling pointers you have stored become invalid.
 +applicationutrace/application will not prevent this, but it can
 +help you detect it safely.  By definition, a task that has been reaped
 +has had all its engines detached.  All
 +applicationutrace/application calls can be safely called on a
 +detached engine if the caller holds a reference on that engine pointer,
 +even if the task pointer passed in the call is invalid.  All calls
 +return constant-ESRCH/constant for a detached engine, which tells
 +you that the task pointer you passed could be invalid now.  Since
 +functionutrace_control/function and
 +functionutrace_set_events/function do not block, you can call those
 +inside a functionrcu_read_lock/function section and be sure after
 +they don't return constant-ESRCH/constant that the task pointer is
 +still valid until functionrcu_read_unlock/function.  The
 +infrastructure never holds task references of its own.  

And here you imply its existence.

 Though neither
 +functionrcu_read_lock/function nor any other lock is held while
 +making a callback, it's always guaranteed that the structnamestruct
 +task_struct/structname and the structnamestruct
 +utrace_engine/structname passed as arguments remain valid
 +until the callback function returns.
 +  /para
 +
 +  para
 +The common means for safely holding task pointers that is available to
 +kernel modules is to use structnamestruct pid/structname, which
 +permits functionput_pid/function from kernel modules.  When using
 +that, the calls functionutrace_attach_pid/function,
 +functionutrace_control_pid/function,
 +functionutrace_set_events_pid/function, and
 +functionutrace_barrier_pid/function are available.
 +  /para

The above seems weird to me at best... why hold a pid around when you
can keep a ref on the task struct? A pid might get reused leaving you
with a completely different task than you thought you had.

 +  /sect2
 +
 +  sect2 id=reap-after-death
 +title
 +  Serialization of constantDEATH/constant and 
 constantREAP/constant
 +/title
 +para
 +  The second guarantee is the serialization of
 +  constantDEATH/constant and constantREAP/constant event
 +  callbacks for a given thread.  The actual reaping by the parent
 +  (functionrelease_task/function call) can occur simultaneously
 +  while the thread is still doing the final steps of dying, including
 +  the functionreport_death/function callback.  If a tracing engine
 +  has requested both constantDEATH/constant and
 +  constantREAP/constant event reports, it's guaranteed that the
 +  functionreport_reap/function callback will not be made until
 

Re: [RFC] [PATCH] In-kernel gdbstub based on utrace Infrastructure.

2009-11-30 Thread Peter Zijlstra
On Mon, 2009-11-30 at 17:33 +0530, Srikar Dronamraju wrote:
 This patch implements an in-kernel gdb stub.
 It provides an interface between gdb and Linux Kernel by implementing
 the remote serial protocol.  This gdbstub uses utrace infrastructure.
 This patch provides register set access, signal mapping, process event
 handling, input/output operations.
 
 /proc/pid/gdb was chosen as file for gdb to interact with the
 process through remote serial protocol. 
 
 Hence users would have to use target remote /proc/pid/gdb command
 on gdb prompt to start using this infrastructure.
 
 For Breakpointing support, gdbstub needs User space breakpointing
 layer and uprobes layer which will be posted later.

How does this compare to kgdb and related efforts?





Re: [RFC] [PATCH] In-kernel gdbstub based on utrace Infrastructure.

2009-11-30 Thread Peter Zijlstra
On Mon, 2009-11-30 at 18:02 +0530, Srikar Dronamraju wrote:
 * Peter Zijlstra pet...@infradead.org [2009-11-30 13:09:12]:
 
  On Mon, 2009-11-30 at 17:33 +0530, Srikar Dronamraju wrote:
   This patch implements an in-kernel gdb stub.
   It provides an interface between gdb and Linux Kernel by implementing
   the remote serial protocol.  This gdbstub uses utrace infrastructure.
   This patch provides register set access, signal mapping, process event
   handling, input/output operations.
   
   /proc/pid/gdb was chosen as file for gdb to interact with the
   process through remote serial protocol. 
   
   Hence users would have to use target remote /proc/pid/gdb command
   on gdb prompt to start using this infrastructure.
   
   For Breakpointing support, gdbstub needs User space breakpointing
   layer and uprobes layer which will be posted later.
  
  How does this compare to kgdb and related efforts?
 
 This is a In-kernel gdbstub to debug user space programs. 
 This stub doesnt help in debugging kernel.
 
 Hence I am not sure how to compare kgdb gdbstub with this gdbstub.
 Can you please provide more pointers on what you were referring to?

Well, not even that much was clear from your changelog, so I wasn't
really sure wtf I was looking at. All it says was an in-kernel gdb stub,
what other than to debug the kernel would you need in-kernel stubs for?

So now my question is, what do you need an in-kernel stub to debug
userspace for?

In general, tell me about this patch thing, what does it do, why does it
do it, and how does it improve on the current situation.

Your changelog doesn't address any of those things, so wth are we
supposed to think?



Re: [RFC] [PATCH] In-kernel gdbstub based on utrace Infrastructure.

2009-11-30 Thread Peter Zijlstra
On Mon, 2009-11-30 at 18:49 +0530, Srikar Dronamraju wrote:

 This is suppose to be one of the interfaces to use utrace, i.e Allow
 gdb to use utrace features without having to change gdb itself.
 Though there are not enough features in this patch, intentions include
 support multi-threaded debugging, concurrent debugger support for same
 process, syscall tracing.
 
 For Breakpoint support(not yet submitted to LKML), it would use
 execution out of line rather than the conventional inline-single
 stepping.
 
 I guess Christoph, Roland and Frank would be able to explain in a better
 fashion the rational and advantages of this stub over convential gdb.

Hmm,. wouldn't it make much more sense to extend the current kgdb stub
to include userspace debugging, providing an all-in-one solution?

I think it would be much more powerful to be able to observe the full
software stack and not be limited by this user-kernel barrier.

(Provided the user has sufficient privileges of course).



Re: [RFC,PATCH 0/14] utrace/ptrace

2009-11-26 Thread Peter Zijlstra
On Thu, 2009-11-26 at 12:37 +0530, Srikar Dronamraju wrote:
 Hi Christoph, 
 
  
  The other thing is that this patchset really doesn't quite justify
  utrace.  It's growing a lot more code without actually growing any
  useful functionality.  What about all those other utrace killer
  features that have been promised for a long time?
  
 
 We are working on in-kernel gdbstub which was one of the features that
 you had asked for. gdbstub does pass unit tests; but we are looking at
 some way to hack the GDB testsuite to run its regression tests. Once we
 are able to run the GDB testsuite and utrace is part of some upstream
 tree, we plan to post these patches to LKML for comments. gdbstub uses
 utrace and uprobes underneath. Uprobes was rewritten to remove issues
 that LKML developers had opposed. Uprobes also has its own ftrace plugin
 to use uprobes.
 
 Currently in-kernel gdbstub is hosted by Frank Ch. Eigler over here:
 git://web.elastic.org/~fche/utrace-ext.git
 branch name utrace-gdbstub-uprobes

If its anywhere near functioning it would have made sense to send it out
as an RFC patch-set right along with the utrace one.