Re: linux-next: add utrace tree

2010-01-20 Thread Frederic Weisbecker
On Wed, Jan 20, 2010 at 12:10:26PM +0530, Ananth N Mavinakayanahalli wrote:
  It will cause conflicts with various other trees and increases the overhead 
  all around. It also causes us to trust linux-next bugreports less - as it's 
  not the 'next Linux' anymore. Also, there's virtually no high-level 
  technical 
  review done in linux-next: the trees are implicitly trusted (because they 
  are 
  pushed by maintainers), bugs and conflicts are reported but otherwise it's 
  a 
  neutral tree that includes pretty much any commit indiscriminately.
  
  If you need review and testing there's a number of trees you can get 
  inclusion 
  into.
 
 So would -tip be one of them? If so could you pull the utrace-ptrace
 branch in?
 
 Or did you intend some other tree (random-tracing)? (Though I think a
 ptrace reimplementation isn't 'random'-tracing :-))


Heh. No this is a tree I use for, well, random tracing patches indeed,
which has extended to random tracing/perf/* patches by the time.
I sometimes relay other's patches to Ingo toward this tree but this is
usually about small volumes and for small term storage: patches that
have been reviewed/acked already.

utrace/uprobe is about high volume and longer time debate/review/maintainance
and I won't have the time to carry this.


 
 Ananth



Re: [RFC] [PATCH 1/7] User Space Breakpoint Assistance Layer (UBP)

2010-01-20 Thread Frederic Weisbecker
On Wed, Jan 20, 2010 at 12:06:20PM +0530, Srikar Dronamraju wrote:
 * Frederic Weisbecker fweis...@gmail.com [2010-01-19 19:06:12]:
 
  On Tue, Jan 19, 2010 at 09:47:45AM -0800, Jim Keniston wrote:
   
   What does the code in the jumped-to vma do?  Is the instrumentation code
   that corresponds to the uprobe handlers encoded in an ad hoc .so?
  
  
  Once the instrumentation is requested by a process that is not the
  instrumented one, this looks impossible to set a uprobe without a
  minimal voluntary collaboration from the instrumented process
  (events sent through IPC or whatever). So that looks too limited,
  this is not anymore a true dynamic uprobe.
 
 I dont see a case where the thread being debugged refuses to place a
 probe unless the process is exiting. The traced process doesnt decide
 if it wants to be probed or not. There could be a slight delay from the
 time the tracer requested to the time the probe is placed. But this
 delay in only affecting the tracer and the tracee. This is in contract
 to say stop_machine where the threads of other applications are also
 affected.


I did not think about a kind of trace point inserted in a shared memory.
I was just confused :)



Re: [RFC] [PATCH 1/7] User Space Breakpoint Assistance Layer (UBP)

2010-01-19 Thread Frederic Weisbecker
On Tue, Jan 19, 2010 at 09:47:45AM -0800, Jim Keniston wrote:
  Do you have plans for a variant 
  that's completely in userspace?
 
 I don't know of any such plans, but I'd be interested to read more of
 your thoughts here.  As I understand it, you've suggested replacing the
 probed instruction with a jump into an instrumentation vma (the XOL
 area, or something similar).  Masami has demonstrated -- through his
 djprobes enhancement to kprobes -- that this can be done for many x86
 instructions.
 
 What does the code in the jumped-to vma do?  Is the instrumentation code
 that corresponds to the uprobe handlers encoded in an ad hoc .so?


Once the instrumentation is requested by a process that is not the
instrumented one, this looks impossible to set a uprobe without a
minimal voluntary collaboration from the instrumented process
(events sent through IPC or whatever). So that looks too limited,
this is not anymore a true dynamic uprobe.



Re: [RFC] [PATCH 7/7] Ftrace plugin for Uprobes

2010-01-18 Thread Frederic Weisbecker
On Thu, Jan 14, 2010 at 01:29:09PM +0100, Peter Zijlstra wrote:
 On Thu, 2010-01-14 at 13:23 +0100, Frederic Weisbecker wrote:
  
  I see, so what you suggest is to have the probe set up
  as generic first. Then the process that activates it
  becomes a consumer, right?
 
 Right, so either we have it always on, for things like ftrace, 
 
   in which case the creation traverses rmap and installs the probes
   all existing mmap()s, and a mmap() hook will install it on all new
   ones.
 
 Or they're strictly consumer driver, like perf, in which case the act of
 enabling the event will install the probe (if its not there yet).
 


Looks like a good plan.



Re: [RFC] [PATCH 7/7] Ftrace plugin for Uprobes

2010-01-14 Thread Frederic Weisbecker
On Thu, Jan 14, 2010 at 12:23:11PM +0100, Peter Zijlstra wrote:
 On Mon, 2010-01-11 at 17:56 +0530, Srikar Dronamraju wrote:
  This patch implements ftrace plugin for uprobes.
 
 Right, like others have said, trace events is a much saner interface.
 
 So the easiest way I can see that working is to register uprobes against
 a file (not a pid). Then on creation it uses rmap to find all current
 maps of that file and install the probe if there is a consumer for that
 map.
 
 Then for each new mmap() of that file, we also need to check if there's
 a consumer ready and install the probe.



That looks racy.

Say you first create a probe on /bin/ls:

perf probe p addr_in_ls /bin/ls

then something else launches /bin/ls behind you, probe
is set on it

then you launch:

perf record -e probe: /bin/ls

Then it goes recording the previous instance.



Re: [RFC] [PATCH 7/7] Ftrace plugin for Uprobes

2010-01-14 Thread Frederic Weisbecker
On Thu, Jan 14, 2010 at 12:43:01PM +0100, Peter Zijlstra wrote:
 On Thu, 2010-01-14 at 12:35 +0100, Frederic Weisbecker wrote:
  On Thu, Jan 14, 2010 at 12:23:11PM +0100, Peter Zijlstra wrote:
   On Mon, 2010-01-11 at 17:56 +0530, Srikar Dronamraju wrote:
This patch implements ftrace plugin for uprobes.
   
   Right, like others have said, trace events is a much saner interface.
   
   So the easiest way I can see that working is to register uprobes against
   a file (not a pid). Then on creation it uses rmap to find all current
   maps of that file and install the probe if there is a consumer for that
   map.
   
   Then for each new mmap() of that file, we also need to check if there's
   a consumer ready and install the probe.
  
  
  
  That looks racy.
  
  Say you first create a probe on /bin/ls:
  
  perf probe p addr_in_ls /bin/ls
  
  then something else launches /bin/ls behind you, probe
  is set on it
  
  then you launch:
  
  perf record -e probe: /bin/ls
  
  Then it goes recording the previous instance.
 
 Uhm, why? Only the perf /bin/ls instance has a consumer and will thus
 have a probe installed.
 
 (Or if you want to use ftrace you need to always have all instances
 probed anyway)


I see, so what you suggest is to have the probe set up
as generic first. Then the process that activates it
becomes a consumer, right?

Would work, yeah.



Re: [RFC] [PATCH 7/7] Ftrace plugin for Uprobes

2010-01-11 Thread Frederic Weisbecker
On Mon, Jan 11, 2010 at 05:56:08PM +0530, Srikar Dronamraju wrote:
 This patch implements ftrace plugin for uprobes.
 
 Description:
 Ftrace plugin provides an interface to dump data at a given address, top of
 the stack and function arguments when a user program calls a specific
 function.


So, as told before, ftrace plugins tend to be relegated to
obsolescence and I first suggested to plug this into kprobe
events so that we have a unified interface to control/create
u|k|kret probe events.

But after digging more into first appearances, uprobe creation
can follow the kprobes creation flow.

kprobe can be created whenever we want. This is about probing
kernel text and it is already there so that we can set the
probe, default deactivated, in advance.

This is much more tricky in the case of uprobes as I see two
ways to work with it:

- probing on an already running process
- probing on a process we are about to run

Now say we create to create a uprobe trace event for an already
running process. No problem in the workflow, we just need to
set the address and the pid. Fine.

Now what if I want to launch ls and want to profile a function
inside. What can I do with a trace event. I can't create the
probe event based on a pid as I don't know it in advance.
I could give it the ls cmdline and it manages to activate
on the next ls launched. This is racy as another ls can
be launched concurrently.

So I can only say there that an ftrace plugin or an ftrace trace
event would be only a half-useful interface to exploit utrace
possibilities because it only lets us trace already running
apps. Moreover I bet the most chosen workflow to profile/trace
uprobes is by launching an app and profile it from the beginning,
not by profiling an already running one, which makes an ftrace
interface even less than half useful there.

ftrace is cool to trace the kernel, but this kind of tricky
userspace tracing workflow is not adapted to it.

What do you think?



Re: x86: do_debug PTRACE_SINGLESTEP broken by 08d68323d1f0c34452e614263b212ca556dae47f

2009-12-18 Thread Frederic Weisbecker
On Fri, Dec 18, 2009 at 12:05:03PM -0800, Roland McGrath wrote:
  Please find the trivial test-case below. It hangs, because
  PTRACE_SINGLESTEP doesn't trigger the trap.
 
 2.6.33-rc1 x86-64 works for me with either -m64 or -m32 version of that test.
 
  (not sure this matters, but I did the testing under kvm)
 
 Apparently it does.  You should hack some printks into do_debug() and see
 how kvm is differing from real hardware.  (Actually you can probably do
 this with a notifier added by a module, not that you are shy about
 recompiling!)  
 
 Probably kvm's emulation of the hardware behavior wrt the DR6 bits is not
 sufficiently faithful.  Conceivably, kvm is being consistent with some
 older hardware and we have encoded assumptions that only newer hardware
 meets.  But I'd guess it's just a plain kvm bug.


It looks like in kvm, before entering the guest, we restore its
debug registers:

vcpu_enter_guest():
if (unlikely(vcpu-arch.switch_db_regs)) {
set_debugreg(0, 7);
set_debugreg(vcpu-arch.eff_db[0], 0);
set_debugreg(vcpu-arch.eff_db[1], 1);
set_debugreg(vcpu-arch.eff_db[2], 2);
set_debugreg(vcpu-arch.eff_db[3], 3);
}


But what happens to dr6, I don't know.

Adding Avi and Jan in Cc.



Re: x86: do_debug PTRACE_SINGLESTEP broken by 08d68323d1f0c34452e614263b212ca556dae47f

2009-12-17 Thread Frederic Weisbecker
On Fri, Dec 18, 2009 at 03:10:42AM +0100, Oleg Nesterov wrote:
 On 12/17, Roland McGrath wrote:
 
  Comparing to the old (2.6.32) logic, I think it might be this (untested).
  I also note this is the sole use of get_si_code, seems like it should
  just be rolled in here.
 
 Well, it is too late for me to even try to read this patch ;)
 
 but...
 
  @@ -569,14 +568,15 @@ dotraplinkage void __kprobes do_debug(struct pt_regs 
  *regs, long error_code)
   * We already checked v86 mode above, so we can check for kernel mode
   * by just checking the CPL of CS.
   */
  +   dr6 = tsk-thread.debugreg6;
 
 why? we have tsk-thread.debugreg6 = dr6 above


Yeah.


 
  if ((dr6  DR_STEP)  !user_mode(regs)) {
  tsk-thread.debugreg6 = ~DR_STEP;
  set_tsk_thread_flag(tsk, TIF_SINGLESTEP);
  regs-flags = ~X86_EFLAGS_TF;
 
 this looks strange... we set TIF_SINGLESTEP but clear X86_EFLAGS_TF


Yep, I don't understand what happens here either. This logic
was there before the refactoring and the comment indicates we want
to ignore traps from kernel. Why do we set this flag in a random
thread?



  +   } else if (dr6  (DR_STEP | DR_TRAP_BITS)) {
  +   send_sigtrap(tsk, regs, error_code, get_si_code(dr6));
  }
  -   si_code = get_si_code(tsk-thread.debugreg6);
  -   if (tsk-thread.debugreg6  (DR_STEP | DR_TRAP_BITS))
  -   send_sigtrap(tsk, regs, error_code, si_code);
  +
 
 can't understand how this change can fix the problem. We should always
 send SIGTRAP if the task returns to user-mode with X86_EFLAGS_TF?
 
 OK. I blindly applied this patch, step-simple still fails.


Yep, that doesn't fix your problem but this patch makes sense
in that if we were not in user mode while the step occured,
we shouldn't send the signal.



Re: [RFC][PATCH 1/2] tracing/ftrace: syscall tracing infrastructure

2009-03-16 Thread Frederic Weisbecker
On Mon, Mar 16, 2009 at 06:18:00PM -0400, Frank Ch. Eigler wrote:
 Hi -
 
 
 On Mon, Mar 16, 2009 at 05:45:26PM -0400, Mathieu Desnoyers wrote:
 
  [...]
   As far as I know, utrace supports multiple trace-engines on a process.
   Since ptrace is just an engine of utrace, you can add another engine on 
   utrace.
   
   utrace-+-ptrace_engine---owner_process
  |
  +-systemtap_module
  |
  +-ftrace_plugin
 
 Right.  In this way, utrace is simply a multiplexing intermediary.
 
 
   Here, Frank had posted an example of utrace-ftrace engine.
   http://lkml.org/lkml/2009/1/27/294
   
   And here is the latest his patch(which seems to support syscall 
   tracing...)
   http://git.kernel.org/?p=linux/kernel/git/frob/linux-2.6-utrace.git;a=blob;f=kernel/trace/trace_process.c;h=619815f6c2543d0d82824139773deb4ca460a280;hb=ab20efa8d8b5ded96e8f8c3663dda3b4cb532124
   
  
  Reminder : we are looking at system-wide tracing here. Here are some
  comments about the current utrace implementation.
  
  Looking at include/linux/utrace.h from the tree
  
  17  * A tracing engine starts by calling utrace_attach_task() or
  18  * utrace_attach_pid() on the chosen thread, passing in a set of hooks
  19  * (struct utrace_engine_ops), and some associated data.  This produces 
  a
  20  * struct utrace_engine, which is the handle used for all other
  21  * operations.  An attached engine has its ops vector, its data, and an
  22  * event mask controlled by utrace_set_events().
  
  So if the system has, say 3000 threads, then we have 3000 struct
  utrace_engine created ? I wonder what effet this could have one
  cachelines if this is used to trace hot paths like system call
  entry/exit. Have you benchmarked this kind of scenario under tbench ?
 
 It has not been a problem, since utrace_engines are designed to be
 lightweight.  Starting or stopping a systemtap script of the form
 
 probe process.syscall {}
 
 appears to have no noticable impact on a tbench suite.
 
 
  24  * For each event bit that is set, that engine will get the
  25  * appropriate ops-report_*() callback when the event occurs.  The
  26  * struct utrace_engine_ops need not provide callbacks for an event
  27  * unless the engine sets one of the associated event bits.
  
  Looking at utrace_set_events(), we seem to be limited to 32 events on a
  32-bits architectures because it uses a bitmask ? Isn't it a bit small?
 
 There are only a few types of thread events that involve different
 classes of treatment, or different degrees of freedom in terms of
 interference with the uninstrumented fast path of the threads.
 
 For example, it does not make sense to have different flag bits for
 different system calls, since choosing to trace *any* system call
 involves taking the thread off of the fast path with the TIF_ flag.
 Once it's off the fast path, it doesn't matter whether the utrace core
 or some client performs syscall discrimination, so it is left to the
 client.
 
 
  682 /**
  683  * utrace_set_events_pid - choose which event reports a tracing engine 
  gets
  684  * @pid:thread to affect
  685  * @engine: attached engine to affect
  686  * @eventmask:  new event mask
  687  *
  688  * This is the same as utrace_set_events(), but takes a struct pid
  689  * pointer rather than a struct task_struct pointer.  The caller must
  690  * hold a ref on @pid, but does not need to worry about the task
  691  * staying valid.  If it's been reaped so that @pid points nowhere,
  692  * then this call returns -%ESRCH.
  
  
  Comments like but does not need to worry about the task staying valid
  does not make me feel safe and comfortable at all, could you explain
  how you can assume that derefencing an invalid pointer will return
  NULL ?
 
 (We're doing a final round of internal (pre-LKML) reviews of the
 utrace implementation right now on utrace-devel@redhat.com, where such
 comments get fastest attention from the experts.)
 
 For this particular issue, the utrace documentation file explains the
 liveness rules for the various pointers that can be fed to or received
 from utrace functions.  This is not about feeling safe, it's about
 what the mechanism is deliberately designed to permit.
 
 
  About the utrace_attach_task() :
  
  244 if (unlikely(target-flags  PF_KTHREAD))
  245 /*
  246  * Silly kernel, utrace is for users!
  247  */
  248 return ERR_PTR(-EPERM);
  
  So we cannot trace kernel threads ?
 
 I'm not quite sure about all the reasons for this, but I believe that
 kernel threads don't tend to engage in job control / signal /
 system-call activities the same way as normal user threads do.
 


Some of them use some syscalls, but it doesn't involve a user/kernel switch.
So it's not tracable by hooking syscall_entry/exit or using tracehooks.
It would require specific hooks on sys_* functions for that.

So this check