Re: KPROBES: Instrumenting a function's call site

2007-11-12 Thread Ananth N Mavinakayanahalli
On Thu, Nov 08, 2007 at 02:42:33PM -0500, Avishay Traeger wrote:
> Sorry for reviving a thread from two months ago... :)
> 
> On Wed, 2007-09-26 at 10:09 +0530, Ananth N Mavinakayanahalli wrote:
> > On Tue, Sep 25, 2007 at 06:12:38PM -0400, Avishay Traeger wrote:
> > > Hello,
> > > I am trying to use kprobes to measure the latency of a function by
> > > instrumenting its call site.  Basically, I find the call instruction,
> > > and insert a kprobe with a pre-handler and post-handler at that point.
> > > The pre-handler measures the latency (reads the TSC counter).  The
> > > post-handler measures the latency again, and subtracts the value that
> > > was read in the pre-handler to compute the total latency of the called
> > > function.
> > 
> > This sounds ok...
> > 
> > > So to measure the latency of foo(), I basically want kprobes to do this:
> > > pre_handler();
> > > foo();
> > > post_handler();
> > > 
> > > The problem is that the latencies that I am getting are consistently low
> > > (~10,000 cycles).  When I manually instrument the functions, the latency
> > > is about 20,000,000 cycles.  Clearly something is not right here.
> > 
> > Is foo() called from too many different places? If so, are you
> > interested with only the invocation of foo() from a specific callsite?
> >  
> > > Is this a known issue?  Instead of using the post-handler, I can try to
> > > add a kprobe to the following instruction with a pre-handler.  I was
> > > just curious if there was something fundamentally wrong with the
> > > approach I took, or maybe a bug that you should be made aware of.
> > 
> > I am not too sure... single-stepping a "call" instruction from a
> > different memory location (single-stepping out of line) requires some
> > fixups and kprobes handles such fixups just fine (see resume_execution()
> > in arch//kernel/kprobes.c)
> > 
> > You could try a a couple of approaches for starters.
> > 
> > a. As you mention above, a kprobe on the function invocation and the
> > other on the instruction following the call; both need just pre_handlers. 
> > 
> > b.
> > - Insert a kprobe and a kretprobe on foo()
> > - The kprobe needs to have only a pre_handler that'll measure the latency
> > - A similar handler for the kretprobe handler can measure the latency
> > again and their difference will give you foo()'s latency.
> > 
> >  though will require you to do some housekeeping in case foo() is
> > reentrant to track which return instance corresponds to which call.
> > 
> > Ananth
> > 
> > PS: There was a thought of providing a facility to run a handler at
> > function entry even when just a kretprobe is used. Maybe we need to
> > relook at that; it'd have been useful in this case.
> 
> The approach that we took until now is (a):
> 
> kprobe1.pre_handler();
> foo();
> kprobe2.pre_handler();
> 
> However, my colleague Ivan has come up with another approach that I
> think it pretty novel:
> - kprob1.pre-handler is placed at the call site as before.
> - A post-handler is also added to this same kprobe, which saves foo's
> return value, and modifies it to point to a new function.  This new
> function contains the code that used to be in kprobe2.pre_handler
> (reading the TSC register a second time and computing the latency).
> This new function then returns to the original return location.

Modifying the return address is analogous to how kretprobes are
implemented.

> The only caveat is that to handle recursion, you need to save a stack of
> return addresses.
> 
> This has 2 main benefits:
> 1. It guarantees that the code contained in the second kprobe is
> executed only when the first is (in case there is a jump in the code to
> the location of the 2nd kprobe).
> 2. We only need one kprobe instead of two, which speeds things up.
> 
> We have tested this, and it works great.  Just wanted to let you know
> about this technique in case this issue comes up again.

Thanks! Maybe it'll be useful if you could code this up as an
independent module and post it to lkml so others can use it? If its
deemed useful, we could add it to the kprobes samples/ modules.

> Have you ever seen kprobes used in this way?

I've heard of people doing things similar to what you've described. (I
think someone had also implemented a "hot-patching" mechanism too, using
kprobes).

Ananth
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: KPROBES: Instrumenting a function's call site

2007-11-12 Thread Ananth N Mavinakayanahalli
On Thu, Nov 08, 2007 at 02:42:33PM -0500, Avishay Traeger wrote:
 Sorry for reviving a thread from two months ago... :)
 
 On Wed, 2007-09-26 at 10:09 +0530, Ananth N Mavinakayanahalli wrote:
  On Tue, Sep 25, 2007 at 06:12:38PM -0400, Avishay Traeger wrote:
   Hello,
   I am trying to use kprobes to measure the latency of a function by
   instrumenting its call site.  Basically, I find the call instruction,
   and insert a kprobe with a pre-handler and post-handler at that point.
   The pre-handler measures the latency (reads the TSC counter).  The
   post-handler measures the latency again, and subtracts the value that
   was read in the pre-handler to compute the total latency of the called
   function.
  
  This sounds ok...
  
   So to measure the latency of foo(), I basically want kprobes to do this:
   pre_handler();
   foo();
   post_handler();
   
   The problem is that the latencies that I am getting are consistently low
   (~10,000 cycles).  When I manually instrument the functions, the latency
   is about 20,000,000 cycles.  Clearly something is not right here.
  
  Is foo() called from too many different places? If so, are you
  interested with only the invocation of foo() from a specific callsite?
   
   Is this a known issue?  Instead of using the post-handler, I can try to
   add a kprobe to the following instruction with a pre-handler.  I was
   just curious if there was something fundamentally wrong with the
   approach I took, or maybe a bug that you should be made aware of.
  
  I am not too sure... single-stepping a call instruction from a
  different memory location (single-stepping out of line) requires some
  fixups and kprobes handles such fixups just fine (see resume_execution()
  in arch/arch/kernel/kprobes.c)
  
  You could try a a couple of approaches for starters.
  
  a. As you mention above, a kprobe on the function invocation and the
  other on the instruction following the call; both need just pre_handlers. 
  
  b.
  - Insert a kprobe and a kretprobe on foo()
  - The kprobe needs to have only a pre_handler that'll measure the latency
  - A similar handler for the kretprobe handler can measure the latency
  again and their difference will give you foo()'s latency.
  
  b though will require you to do some housekeeping in case foo() is
  reentrant to track which return instance corresponds to which call.
  
  Ananth
  
  PS: There was a thought of providing a facility to run a handler at
  function entry even when just a kretprobe is used. Maybe we need to
  relook at that; it'd have been useful in this case.
 
 The approach that we took until now is (a):
 
 kprobe1.pre_handler();
 foo();
 kprobe2.pre_handler();
 
 However, my colleague Ivan has come up with another approach that I
 think it pretty novel:
 - kprob1.pre-handler is placed at the call site as before.
 - A post-handler is also added to this same kprobe, which saves foo's
 return value, and modifies it to point to a new function.  This new
 function contains the code that used to be in kprobe2.pre_handler
 (reading the TSC register a second time and computing the latency).
 This new function then returns to the original return location.

Modifying the return address is analogous to how kretprobes are
implemented.

 The only caveat is that to handle recursion, you need to save a stack of
 return addresses.
 
 This has 2 main benefits:
 1. It guarantees that the code contained in the second kprobe is
 executed only when the first is (in case there is a jump in the code to
 the location of the 2nd kprobe).
 2. We only need one kprobe instead of two, which speeds things up.
 
 We have tested this, and it works great.  Just wanted to let you know
 about this technique in case this issue comes up again.

Thanks! Maybe it'll be useful if you could code this up as an
independent module and post it to lkml so others can use it? If its
deemed useful, we could add it to the kprobes samples/ modules.

 Have you ever seen kprobes used in this way?

I've heard of people doing things similar to what you've described. (I
think someone had also implemented a hot-patching mechanism too, using
kprobes).

Ananth
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: KPROBES: Instrumenting a function's call site

2007-11-08 Thread Avishay Traeger
Sorry for reviving a thread from two months ago... :)

On Wed, 2007-09-26 at 10:09 +0530, Ananth N Mavinakayanahalli wrote:
> On Tue, Sep 25, 2007 at 06:12:38PM -0400, Avishay Traeger wrote:
> > Hello,
> > I am trying to use kprobes to measure the latency of a function by
> > instrumenting its call site.  Basically, I find the call instruction,
> > and insert a kprobe with a pre-handler and post-handler at that point.
> > The pre-handler measures the latency (reads the TSC counter).  The
> > post-handler measures the latency again, and subtracts the value that
> > was read in the pre-handler to compute the total latency of the called
> > function.
> 
> This sounds ok...
> 
> > So to measure the latency of foo(), I basically want kprobes to do this:
> > pre_handler();
> > foo();
> > post_handler();
> > 
> > The problem is that the latencies that I am getting are consistently low
> > (~10,000 cycles).  When I manually instrument the functions, the latency
> > is about 20,000,000 cycles.  Clearly something is not right here.
> 
> Is foo() called from too many different places? If so, are you
> interested with only the invocation of foo() from a specific callsite?
>  
> > Is this a known issue?  Instead of using the post-handler, I can try to
> > add a kprobe to the following instruction with a pre-handler.  I was
> > just curious if there was something fundamentally wrong with the
> > approach I took, or maybe a bug that you should be made aware of.
> 
> I am not too sure... single-stepping a "call" instruction from a
> different memory location (single-stepping out of line) requires some
> fixups and kprobes handles such fixups just fine (see resume_execution()
> in arch//kernel/kprobes.c)
> 
> You could try a a couple of approaches for starters.
> 
> a. As you mention above, a kprobe on the function invocation and the
> other on the instruction following the call; both need just pre_handlers. 
> 
> b.
> - Insert a kprobe and a kretprobe on foo()
> - The kprobe needs to have only a pre_handler that'll measure the latency
> - A similar handler for the kretprobe handler can measure the latency
> again and their difference will give you foo()'s latency.
> 
>  though will require you to do some housekeeping in case foo() is
> reentrant to track which return instance corresponds to which call.
> 
> Ananth
> 
> PS: There was a thought of providing a facility to run a handler at
> function entry even when just a kretprobe is used. Maybe we need to
> relook at that; it'd have been useful in this case.

The approach that we took until now is (a):

kprobe1.pre_handler();
foo();
kprobe2.pre_handler();

However, my colleague Ivan has come up with another approach that I
think it pretty novel:
- kprob1.pre-handler is placed at the call site as before.
- A post-handler is also added to this same kprobe, which saves foo's
return value, and modifies it to point to a new function.  This new
function contains the code that used to be in kprobe2.pre_handler
(reading the TSC register a second time and computing the latency).
This new function then returns to the original return location.

The only caveat is that to handle recursion, you need to save a stack of
return addresses.

This has 2 main benefits:
1. It guarantees that the code contained in the second kprobe is
executed only when the first is (in case there is a jump in the code to
the location of the 2nd kprobe).
2. We only need one kprobe instead of two, which speeds things up.

We have tested this, and it works great.  Just wanted to let you know
about this technique in case this issue comes up again.  Have you ever
seen kprobes used in this way?

Thanks,
Avishay

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: KPROBES: Instrumenting a function's call site

2007-11-08 Thread Avishay Traeger
Sorry for reviving a thread from two months ago... :)

On Wed, 2007-09-26 at 10:09 +0530, Ananth N Mavinakayanahalli wrote:
 On Tue, Sep 25, 2007 at 06:12:38PM -0400, Avishay Traeger wrote:
  Hello,
  I am trying to use kprobes to measure the latency of a function by
  instrumenting its call site.  Basically, I find the call instruction,
  and insert a kprobe with a pre-handler and post-handler at that point.
  The pre-handler measures the latency (reads the TSC counter).  The
  post-handler measures the latency again, and subtracts the value that
  was read in the pre-handler to compute the total latency of the called
  function.
 
 This sounds ok...
 
  So to measure the latency of foo(), I basically want kprobes to do this:
  pre_handler();
  foo();
  post_handler();
  
  The problem is that the latencies that I am getting are consistently low
  (~10,000 cycles).  When I manually instrument the functions, the latency
  is about 20,000,000 cycles.  Clearly something is not right here.
 
 Is foo() called from too many different places? If so, are you
 interested with only the invocation of foo() from a specific callsite?
  
  Is this a known issue?  Instead of using the post-handler, I can try to
  add a kprobe to the following instruction with a pre-handler.  I was
  just curious if there was something fundamentally wrong with the
  approach I took, or maybe a bug that you should be made aware of.
 
 I am not too sure... single-stepping a call instruction from a
 different memory location (single-stepping out of line) requires some
 fixups and kprobes handles such fixups just fine (see resume_execution()
 in arch/arch/kernel/kprobes.c)
 
 You could try a a couple of approaches for starters.
 
 a. As you mention above, a kprobe on the function invocation and the
 other on the instruction following the call; both need just pre_handlers. 
 
 b.
 - Insert a kprobe and a kretprobe on foo()
 - The kprobe needs to have only a pre_handler that'll measure the latency
 - A similar handler for the kretprobe handler can measure the latency
 again and their difference will give you foo()'s latency.
 
 b though will require you to do some housekeeping in case foo() is
 reentrant to track which return instance corresponds to which call.
 
 Ananth
 
 PS: There was a thought of providing a facility to run a handler at
 function entry even when just a kretprobe is used. Maybe we need to
 relook at that; it'd have been useful in this case.

The approach that we took until now is (a):

kprobe1.pre_handler();
foo();
kprobe2.pre_handler();

However, my colleague Ivan has come up with another approach that I
think it pretty novel:
- kprob1.pre-handler is placed at the call site as before.
- A post-handler is also added to this same kprobe, which saves foo's
return value, and modifies it to point to a new function.  This new
function contains the code that used to be in kprobe2.pre_handler
(reading the TSC register a second time and computing the latency).
This new function then returns to the original return location.

The only caveat is that to handle recursion, you need to save a stack of
return addresses.

This has 2 main benefits:
1. It guarantees that the code contained in the second kprobe is
executed only when the first is (in case there is a jump in the code to
the location of the 2nd kprobe).
2. We only need one kprobe instead of two, which speeds things up.

We have tested this, and it works great.  Just wanted to let you know
about this technique in case this issue comes up again.  Have you ever
seen kprobes used in this way?

Thanks,
Avishay

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: KPROBES: Instrumenting a function's call site

2007-09-26 Thread Avishay Traeger
On Wed, 2007-09-26 at 10:28 -0700, Keshavamurthy, Anil S wrote:
> On Wed, Sep 26, 2007 at 10:09:33AM +0530, Ananth N Mavinakayanahalli wrote:
> > On Tue, Sep 25, 2007 at 06:12:38PM -0400, Avishay Traeger wrote:
> > > Hello,
> > > I am trying to use kprobes to measure the latency of a function by
> > > instrumenting its call site.  Basically, I find the call instruction,
> > > and insert a kprobe with a pre-handler and post-handler at that point.
> > > The pre-handler measures the latency (reads the TSC counter).  The
> > > post-handler measures the latency again, and subtracts the value that
> > > was read in the pre-handler to compute the total latency of the called
> > > function.
> > 
> > This sounds ok...
> 
> So what you are really measuring is the latency of just that single 
> instruction where you have inserted the probe i.e. because your
> pre-handler is called just before the probed instruction is executed and
> your post-handler is called right after you probed instruction is 
> single-stepped.

Exactly - I want to profile that single instance of the call.

> > 
> > > So to measure the latency of foo(), I basically want kprobes to do this:
> > > pre_handler();
> > > foo();
> When you insert a probe, you are inserting  probe on an instruction boundary
> and not at function level.
> > > post_handler();
> 
> Hence the above looks like
> 
> pre-handler()
> Probed-instruction; // most likely the first instruction in the foo();
> post-hanlder()
> rest-of-foo()

I see.

> > > 
> > > The problem is that the latencies that I am getting are consistently low
> > > (~10,000 cycles).  When I manually instrument the functions, the latency
> > > is about 20,000,000 cycles.  Clearly something is not right here.
> As I mentioned above what you are seeing is the latency of just the
> probed instruction and hence it is very very low compared to
> the latency of the function foo().
> 
> > You could try a a couple of approaches for starters.
> I agree with Ananth, you can try the below approaches
> for your measurements.
> 
> > 
> > a. As you mention above, a kprobe on the function invocation and the
> > other on the instruction following the call; both need just pre_handlers. 
> > 
> > b.
> > - Insert a kprobe and a kretprobe on foo()
> > - The kprobe needs to have only a pre_handler that'll measure the latency
> > - A similar handler for the kretprobe handler can measure the latency
> > again and their difference will give you foo()'s latency.
> > 
> >  though will require you to do some housekeeping in case foo() is
> > reentrant to track which return instance corresponds to which call.
> > 
> > Ananth
> > 
> > PS: There was a thought of providing a facility to run a handler at
> > function entry even when just a kretprobe is used. Maybe we need to
> > relook at that; it'd have been useful in this case.

Thanks for the clarifications!

Avishay

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: KPROBES: Instrumenting a function's call site

2007-09-26 Thread Avishay Traeger
On Wed, 2007-09-26 at 22:57 +0530, Ananth N Mavinakayanahalli wrote:
> On Wed, Sep 26, 2007 at 12:09:35PM -0400, Avishay Traeger wrote:
> > On Wed, 2007-09-26 at 14:33 +0530, Ananth N Mavinakayanahalli wrote:
> > > What happens when the "call" is singlestepped is that the instruction
> > > pointer is moved to the call target. That explains the lower latency you
> > > are seeing. You'll need to do something along the lines I suggested in
> > > the earlier mail.
> > 
> > Can you please explain what you mean by this more clearly?  I'm not a
> > kprobes expert yet.  Specifically, using kprobes the way that I did,
> > what will the resulting code look like?  Also, what do you mean by
> > "singlestepped"?
> 
> If you single-step (regs->eflags | TF_MASK in i386) on a call instruction,
> you'll end up at the call target; ie., after the post_kprobe_handler()
> returns, the instruction pointer will point to the first instruction
> of foo().
> 
> Try printk()ing the instruction pointer(regs) after resume_execution()
> in the post_kprobe_handler() in your arch//kernel/kprobes.c, you'll
> see what I mean.
> 
> And when I say singlestepped, I mean executing one instruction under the
> architecture specific single step enable flag - the "trap" flag for i386,
> the MSR_SE for powerpc, etc. Evidently, this'll mean single-stepping a
> single instruction.
> 
> Ananth

I see - thanks for all your prompt and helpful advice!

Avishay

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: KPROBES: Instrumenting a function's call site

2007-09-26 Thread Keshavamurthy, Anil S
On Wed, Sep 26, 2007 at 10:09:33AM +0530, Ananth N Mavinakayanahalli wrote:
> On Tue, Sep 25, 2007 at 06:12:38PM -0400, Avishay Traeger wrote:
> > Hello,
> > I am trying to use kprobes to measure the latency of a function by
> > instrumenting its call site.  Basically, I find the call instruction,
> > and insert a kprobe with a pre-handler and post-handler at that point.
> > The pre-handler measures the latency (reads the TSC counter).  The
> > post-handler measures the latency again, and subtracts the value that
> > was read in the pre-handler to compute the total latency of the called
> > function.
> 
> This sounds ok...

So what you are really measuring is the latency of just that single 
instruction where you have inserted the probe i.e. because your
pre-handler is called just before the probed instruction is executed and
your post-handler is called right after you probed instruction is 
single-stepped.

> 
> > So to measure the latency of foo(), I basically want kprobes to do this:
> > pre_handler();
> > foo();
When you insert a probe, you are inserting  probe on an instruction boundary
and not at function level.
> > post_handler();

Hence the above looks like

pre-handler()
Probed-instruction; // most likely the first instruction in the foo();
post-hanlder()
rest-of-foo()

> > 
> > The problem is that the latencies that I am getting are consistently low
> > (~10,000 cycles).  When I manually instrument the functions, the latency
> > is about 20,000,000 cycles.  Clearly something is not right here.
As I mentioned above what you are seeing is the latency of just the
probed instruction and hence it is very very low compared to
the latency of the function foo().

> You could try a a couple of approaches for starters.
I agree with Ananth, you can try the below approaches
for your measurements.

> 
> a. As you mention above, a kprobe on the function invocation and the
> other on the instruction following the call; both need just pre_handlers. 
> 
> b.
> - Insert a kprobe and a kretprobe on foo()
> - The kprobe needs to have only a pre_handler that'll measure the latency
> - A similar handler for the kretprobe handler can measure the latency
> again and their difference will give you foo()'s latency.
> 
>  though will require you to do some housekeeping in case foo() is
> reentrant to track which return instance corresponds to which call.
> 
> Ananth
> 
> PS: There was a thought of providing a facility to run a handler at
> function entry even when just a kretprobe is used. Maybe we need to
> relook at that; it'd have been useful in this case.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: KPROBES: Instrumenting a function's call site

2007-09-26 Thread Ananth N Mavinakayanahalli
On Wed, Sep 26, 2007 at 12:09:35PM -0400, Avishay Traeger wrote:
> On Wed, 2007-09-26 at 14:33 +0530, Ananth N Mavinakayanahalli wrote:
> > What happens when the "call" is singlestepped is that the instruction
> > pointer is moved to the call target. That explains the lower latency you
> > are seeing. You'll need to do something along the lines I suggested in
> > the earlier mail.
> 
> Can you please explain what you mean by this more clearly?  I'm not a
> kprobes expert yet.  Specifically, using kprobes the way that I did,
> what will the resulting code look like?  Also, what do you mean by
> "singlestepped"?

If you single-step (regs->eflags | TF_MASK in i386) on a call instruction,
you'll end up at the call target; ie., after the post_kprobe_handler()
returns, the instruction pointer will point to the first instruction
of foo().

Try printk()ing the instruction pointer(regs) after resume_execution()
in the post_kprobe_handler() in your arch//kernel/kprobes.c, you'll
see what I mean.

And when I say singlestepped, I mean executing one instruction under the
architecture specific single step enable flag - the "trap" flag for i386,
the MSR_SE for powerpc, etc. Evidently, this'll mean single-stepping a
single instruction.

Ananth
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: KPROBES: Instrumenting a function's call site

2007-09-26 Thread Abhishek Sagar
On 9/26/07, Ananth N Mavinakayanahalli <[EMAIL PROTECTED]> wrote:

> PS: There was a thought of providing a facility to run a handler at
> function entry even when just a kretprobe is used. Maybe we need to
> relook at that; it'd have been useful in this case.

That would be really useful. I was writing a tool to dump some
function profiling info through /proc. This would help save an extra
probe on each function's entry.

-
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [EMAIL PROTECTED]
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: KPROBES: Instrumenting a function's call site

2007-09-26 Thread Abhishek Sagar
On 9/26/07, Avishay Traeger <[EMAIL PROTECTED]> wrote:
> So to measure the latency of foo(), I basically want kprobes to do this:
> pre_handler();
> foo();
> post_handler();
>
> The problem is that the latencies that I am getting are consistently low
> (~10,000 cycles).  When I manually instrument the functions, the latency
> is about 20,000,000 cycles.  Clearly something is not right here.

Single-stepping is done with preemption (and on some archs like ARM,
interrupts) disabled. May be that is contributing to such a skew.

> Is this a known issue?  Instead of using the post-handler, I can try to
> add a kprobe to the following instruction with a pre-handler.  I was
> just curious if there was something fundamentally wrong with the
> approach I took, or maybe a bug that you should be made aware of.
>
> Please CC me on any replies (not subscribed to LKML).
>
> Thanks,
> Avishay
>
--
Regards
Abhishek Sagar

-
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [EMAIL PROTECTED]
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: KPROBES: Instrumenting a function's call site

2007-09-26 Thread Avishay Traeger
On Wed, 2007-09-26 at 14:33 +0530, Ananth N Mavinakayanahalli wrote:
> What happens when the "call" is singlestepped is that the instruction
> pointer is moved to the call target. That explains the lower latency you
> are seeing. You'll need to do something along the lines I suggested in
> the earlier mail.

Can you please explain what you mean by this more clearly?  I'm not a
kprobes expert yet.  Specifically, using kprobes the way that I did,
what will the resulting code look like?  Also, what do you mean by
"singlestepped"?

Thanks,
Avishay

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: KPROBES: Instrumenting a function's call site

2007-09-26 Thread Ananth N Mavinakayanahalli
On Wed, Sep 26, 2007 at 10:09:33AM +0530, Ananth N Mavinakayanahalli wrote:
> On Tue, Sep 25, 2007 at 06:12:38PM -0400, Avishay Traeger wrote:
> > Hello,
> > I am trying to use kprobes to measure the latency of a function by
> > instrumenting its call site.  Basically, I find the call instruction,
> > and insert a kprobe with a pre-handler and post-handler at that point.
> > The pre-handler measures the latency (reads the TSC counter).  The
> > post-handler measures the latency again, and subtracts the value that
> > was read in the pre-handler to compute the total latency of the called
> > function.
> 
> This sounds ok...

Thinking further, its not ok :(

> > So to measure the latency of foo(), I basically want kprobes to do this:
> > pre_handler();
> > foo();
> > post_handler();
> > 
> > The problem is that the latencies that I am getting are consistently low
> > (~10,000 cycles).  When I manually instrument the functions, the latency
> > is about 20,000,000 cycles.  Clearly something is not right here.

What happens when the "call" is singlestepped is that the instruction
pointer is moved to the call target. That explains the lower latency you
are seeing. You'll need to do something along the lines I suggested in
the earlier mail.

Ananth
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: KPROBES: Instrumenting a function's call site

2007-09-26 Thread Ananth N Mavinakayanahalli
On Wed, Sep 26, 2007 at 10:09:33AM +0530, Ananth N Mavinakayanahalli wrote:
 On Tue, Sep 25, 2007 at 06:12:38PM -0400, Avishay Traeger wrote:
  Hello,
  I am trying to use kprobes to measure the latency of a function by
  instrumenting its call site.  Basically, I find the call instruction,
  and insert a kprobe with a pre-handler and post-handler at that point.
  The pre-handler measures the latency (reads the TSC counter).  The
  post-handler measures the latency again, and subtracts the value that
  was read in the pre-handler to compute the total latency of the called
  function.
 
 This sounds ok...

Thinking further, its not ok :(

  So to measure the latency of foo(), I basically want kprobes to do this:
  pre_handler();
  foo();
  post_handler();
  
  The problem is that the latencies that I am getting are consistently low
  (~10,000 cycles).  When I manually instrument the functions, the latency
  is about 20,000,000 cycles.  Clearly something is not right here.

What happens when the call is singlestepped is that the instruction
pointer is moved to the call target. That explains the lower latency you
are seeing. You'll need to do something along the lines I suggested in
the earlier mail.

Ananth
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: KPROBES: Instrumenting a function's call site

2007-09-26 Thread Avishay Traeger
On Wed, 2007-09-26 at 14:33 +0530, Ananth N Mavinakayanahalli wrote:
 What happens when the call is singlestepped is that the instruction
 pointer is moved to the call target. That explains the lower latency you
 are seeing. You'll need to do something along the lines I suggested in
 the earlier mail.

Can you please explain what you mean by this more clearly?  I'm not a
kprobes expert yet.  Specifically, using kprobes the way that I did,
what will the resulting code look like?  Also, what do you mean by
singlestepped?

Thanks,
Avishay

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: KPROBES: Instrumenting a function's call site

2007-09-26 Thread Abhishek Sagar
On 9/26/07, Avishay Traeger [EMAIL PROTECTED] wrote:
 So to measure the latency of foo(), I basically want kprobes to do this:
 pre_handler();
 foo();
 post_handler();

 The problem is that the latencies that I am getting are consistently low
 (~10,000 cycles).  When I manually instrument the functions, the latency
 is about 20,000,000 cycles.  Clearly something is not right here.

Single-stepping is done with preemption (and on some archs like ARM,
interrupts) disabled. May be that is contributing to such a skew.

 Is this a known issue?  Instead of using the post-handler, I can try to
 add a kprobe to the following instruction with a pre-handler.  I was
 just curious if there was something fundamentally wrong with the
 approach I took, or maybe a bug that you should be made aware of.

 Please CC me on any replies (not subscribed to LKML).

 Thanks,
 Avishay

--
Regards
Abhishek Sagar

-
 To unsubscribe from this list: send the line unsubscribe linux-kernel in
 the body of a message to [EMAIL PROTECTED]
 More majordomo info at  http://vger.kernel.org/majordomo-info.html
 Please read the FAQ at  http://www.tux.org/lkml/
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: KPROBES: Instrumenting a function's call site

2007-09-26 Thread Abhishek Sagar
On 9/26/07, Ananth N Mavinakayanahalli [EMAIL PROTECTED] wrote:

 PS: There was a thought of providing a facility to run a handler at
 function entry even when just a kretprobe is used. Maybe we need to
 relook at that; it'd have been useful in this case.

That would be really useful. I was writing a tool to dump some
function profiling info through /proc. This would help save an extra
probe on each function's entry.

-
 To unsubscribe from this list: send the line unsubscribe linux-kernel in
 the body of a message to [EMAIL PROTECTED]
 More majordomo info at  http://vger.kernel.org/majordomo-info.html
 Please read the FAQ at  http://www.tux.org/lkml/
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: KPROBES: Instrumenting a function's call site

2007-09-26 Thread Ananth N Mavinakayanahalli
On Wed, Sep 26, 2007 at 12:09:35PM -0400, Avishay Traeger wrote:
 On Wed, 2007-09-26 at 14:33 +0530, Ananth N Mavinakayanahalli wrote:
  What happens when the call is singlestepped is that the instruction
  pointer is moved to the call target. That explains the lower latency you
  are seeing. You'll need to do something along the lines I suggested in
  the earlier mail.
 
 Can you please explain what you mean by this more clearly?  I'm not a
 kprobes expert yet.  Specifically, using kprobes the way that I did,
 what will the resulting code look like?  Also, what do you mean by
 singlestepped?

If you single-step (regs-eflags | TF_MASK in i386) on a call instruction,
you'll end up at the call target; ie., after the post_kprobe_handler()
returns, the instruction pointer will point to the first instruction
of foo().

Try printk()ing the instruction pointer(regs) after resume_execution()
in the post_kprobe_handler() in your arch/arch/kernel/kprobes.c, you'll
see what I mean.

And when I say singlestepped, I mean executing one instruction under the
architecture specific single step enable flag - the trap flag for i386,
the MSR_SE for powerpc, etc. Evidently, this'll mean single-stepping a
single instruction.

Ananth
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: KPROBES: Instrumenting a function's call site

2007-09-26 Thread Keshavamurthy, Anil S
On Wed, Sep 26, 2007 at 10:09:33AM +0530, Ananth N Mavinakayanahalli wrote:
 On Tue, Sep 25, 2007 at 06:12:38PM -0400, Avishay Traeger wrote:
  Hello,
  I am trying to use kprobes to measure the latency of a function by
  instrumenting its call site.  Basically, I find the call instruction,
  and insert a kprobe with a pre-handler and post-handler at that point.
  The pre-handler measures the latency (reads the TSC counter).  The
  post-handler measures the latency again, and subtracts the value that
  was read in the pre-handler to compute the total latency of the called
  function.
 
 This sounds ok...

So what you are really measuring is the latency of just that single 
instruction where you have inserted the probe i.e. because your
pre-handler is called just before the probed instruction is executed and
your post-handler is called right after you probed instruction is 
single-stepped.

 
  So to measure the latency of foo(), I basically want kprobes to do this:
  pre_handler();
  foo();
When you insert a probe, you are inserting  probe on an instruction boundary
and not at function level.
  post_handler();

Hence the above looks like

pre-handler()
Probed-instruction; // most likely the first instruction in the foo();
post-hanlder()
rest-of-foo()

  
  The problem is that the latencies that I am getting are consistently low
  (~10,000 cycles).  When I manually instrument the functions, the latency
  is about 20,000,000 cycles.  Clearly something is not right here.
As I mentioned above what you are seeing is the latency of just the
probed instruction and hence it is very very low compared to
the latency of the function foo().

 You could try a a couple of approaches for starters.
I agree with Ananth, you can try the below approaches
for your measurements.

 
 a. As you mention above, a kprobe on the function invocation and the
 other on the instruction following the call; both need just pre_handlers. 
 
 b.
 - Insert a kprobe and a kretprobe on foo()
 - The kprobe needs to have only a pre_handler that'll measure the latency
 - A similar handler for the kretprobe handler can measure the latency
 again and their difference will give you foo()'s latency.
 
 b though will require you to do some housekeeping in case foo() is
 reentrant to track which return instance corresponds to which call.
 
 Ananth
 
 PS: There was a thought of providing a facility to run a handler at
 function entry even when just a kretprobe is used. Maybe we need to
 relook at that; it'd have been useful in this case.
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: KPROBES: Instrumenting a function's call site

2007-09-26 Thread Avishay Traeger
On Wed, 2007-09-26 at 22:57 +0530, Ananth N Mavinakayanahalli wrote:
 On Wed, Sep 26, 2007 at 12:09:35PM -0400, Avishay Traeger wrote:
  On Wed, 2007-09-26 at 14:33 +0530, Ananth N Mavinakayanahalli wrote:
   What happens when the call is singlestepped is that the instruction
   pointer is moved to the call target. That explains the lower latency you
   are seeing. You'll need to do something along the lines I suggested in
   the earlier mail.
  
  Can you please explain what you mean by this more clearly?  I'm not a
  kprobes expert yet.  Specifically, using kprobes the way that I did,
  what will the resulting code look like?  Also, what do you mean by
  singlestepped?
 
 If you single-step (regs-eflags | TF_MASK in i386) on a call instruction,
 you'll end up at the call target; ie., after the post_kprobe_handler()
 returns, the instruction pointer will point to the first instruction
 of foo().
 
 Try printk()ing the instruction pointer(regs) after resume_execution()
 in the post_kprobe_handler() in your arch/arch/kernel/kprobes.c, you'll
 see what I mean.
 
 And when I say singlestepped, I mean executing one instruction under the
 architecture specific single step enable flag - the trap flag for i386,
 the MSR_SE for powerpc, etc. Evidently, this'll mean single-stepping a
 single instruction.
 
 Ananth

I see - thanks for all your prompt and helpful advice!

Avishay

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: KPROBES: Instrumenting a function's call site

2007-09-26 Thread Avishay Traeger
On Wed, 2007-09-26 at 10:28 -0700, Keshavamurthy, Anil S wrote:
 On Wed, Sep 26, 2007 at 10:09:33AM +0530, Ananth N Mavinakayanahalli wrote:
  On Tue, Sep 25, 2007 at 06:12:38PM -0400, Avishay Traeger wrote:
   Hello,
   I am trying to use kprobes to measure the latency of a function by
   instrumenting its call site.  Basically, I find the call instruction,
   and insert a kprobe with a pre-handler and post-handler at that point.
   The pre-handler measures the latency (reads the TSC counter).  The
   post-handler measures the latency again, and subtracts the value that
   was read in the pre-handler to compute the total latency of the called
   function.
  
  This sounds ok...
 
 So what you are really measuring is the latency of just that single 
 instruction where you have inserted the probe i.e. because your
 pre-handler is called just before the probed instruction is executed and
 your post-handler is called right after you probed instruction is 
 single-stepped.

Exactly - I want to profile that single instance of the call.

  
   So to measure the latency of foo(), I basically want kprobes to do this:
   pre_handler();
   foo();
 When you insert a probe, you are inserting  probe on an instruction boundary
 and not at function level.
   post_handler();
 
 Hence the above looks like
 
 pre-handler()
 Probed-instruction; // most likely the first instruction in the foo();
 post-hanlder()
 rest-of-foo()

I see.

   
   The problem is that the latencies that I am getting are consistently low
   (~10,000 cycles).  When I manually instrument the functions, the latency
   is about 20,000,000 cycles.  Clearly something is not right here.
 As I mentioned above what you are seeing is the latency of just the
 probed instruction and hence it is very very low compared to
 the latency of the function foo().
 
  You could try a a couple of approaches for starters.
 I agree with Ananth, you can try the below approaches
 for your measurements.
 
  
  a. As you mention above, a kprobe on the function invocation and the
  other on the instruction following the call; both need just pre_handlers. 
  
  b.
  - Insert a kprobe and a kretprobe on foo()
  - The kprobe needs to have only a pre_handler that'll measure the latency
  - A similar handler for the kretprobe handler can measure the latency
  again and their difference will give you foo()'s latency.
  
  b though will require you to do some housekeeping in case foo() is
  reentrant to track which return instance corresponds to which call.
  
  Ananth
  
  PS: There was a thought of providing a facility to run a handler at
  function entry even when just a kretprobe is used. Maybe we need to
  relook at that; it'd have been useful in this case.

Thanks for the clarifications!

Avishay

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: KPROBES: Instrumenting a function's call site

2007-09-25 Thread Ananth N Mavinakayanahalli
On Tue, Sep 25, 2007 at 06:12:38PM -0400, Avishay Traeger wrote:
> Hello,
> I am trying to use kprobes to measure the latency of a function by
> instrumenting its call site.  Basically, I find the call instruction,
> and insert a kprobe with a pre-handler and post-handler at that point.
> The pre-handler measures the latency (reads the TSC counter).  The
> post-handler measures the latency again, and subtracts the value that
> was read in the pre-handler to compute the total latency of the called
> function.

This sounds ok...

> So to measure the latency of foo(), I basically want kprobes to do this:
> pre_handler();
> foo();
> post_handler();
> 
> The problem is that the latencies that I am getting are consistently low
> (~10,000 cycles).  When I manually instrument the functions, the latency
> is about 20,000,000 cycles.  Clearly something is not right here.

Is foo() called from too many different places? If so, are you
interested with only the invocation of foo() from a specific callsite?
 
> Is this a known issue?  Instead of using the post-handler, I can try to
> add a kprobe to the following instruction with a pre-handler.  I was
> just curious if there was something fundamentally wrong with the
> approach I took, or maybe a bug that you should be made aware of.

I am not too sure... single-stepping a "call" instruction from a
different memory location (single-stepping out of line) requires some
fixups and kprobes handles such fixups just fine (see resume_execution()
in arch//kernel/kprobes.c)

You could try a a couple of approaches for starters.

a. As you mention above, a kprobe on the function invocation and the
other on the instruction following the call; both need just pre_handlers. 

b.
- Insert a kprobe and a kretprobe on foo()
- The kprobe needs to have only a pre_handler that'll measure the latency
- A similar handler for the kretprobe handler can measure the latency
again and their difference will give you foo()'s latency.

 though will require you to do some housekeeping in case foo() is
reentrant to track which return instance corresponds to which call.

Ananth

PS: There was a thought of providing a facility to run a handler at
function entry even when just a kretprobe is used. Maybe we need to
relook at that; it'd have been useful in this case.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


KPROBES: Instrumenting a function's call site

2007-09-25 Thread Avishay Traeger
Hello,
I am trying to use kprobes to measure the latency of a function by
instrumenting its call site.  Basically, I find the call instruction,
and insert a kprobe with a pre-handler and post-handler at that point.
The pre-handler measures the latency (reads the TSC counter).  The
post-handler measures the latency again, and subtracts the value that
was read in the pre-handler to compute the total latency of the called
function.

So to measure the latency of foo(), I basically want kprobes to do this:
pre_handler();
foo();
post_handler();

The problem is that the latencies that I am getting are consistently low
(~10,000 cycles).  When I manually instrument the functions, the latency
is about 20,000,000 cycles.  Clearly something is not right here.

Is this a known issue?  Instead of using the post-handler, I can try to
add a kprobe to the following instruction with a pre-handler.  I was
just curious if there was something fundamentally wrong with the
approach I took, or maybe a bug that you should be made aware of.

Please CC me on any replies (not subscribed to LKML).

Thanks,
Avishay

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


KPROBES: Instrumenting a function's call site

2007-09-25 Thread Avishay Traeger
Hello,
I am trying to use kprobes to measure the latency of a function by
instrumenting its call site.  Basically, I find the call instruction,
and insert a kprobe with a pre-handler and post-handler at that point.
The pre-handler measures the latency (reads the TSC counter).  The
post-handler measures the latency again, and subtracts the value that
was read in the pre-handler to compute the total latency of the called
function.

So to measure the latency of foo(), I basically want kprobes to do this:
pre_handler();
foo();
post_handler();

The problem is that the latencies that I am getting are consistently low
(~10,000 cycles).  When I manually instrument the functions, the latency
is about 20,000,000 cycles.  Clearly something is not right here.

Is this a known issue?  Instead of using the post-handler, I can try to
add a kprobe to the following instruction with a pre-handler.  I was
just curious if there was something fundamentally wrong with the
approach I took, or maybe a bug that you should be made aware of.

Please CC me on any replies (not subscribed to LKML).

Thanks,
Avishay

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: KPROBES: Instrumenting a function's call site

2007-09-25 Thread Ananth N Mavinakayanahalli
On Tue, Sep 25, 2007 at 06:12:38PM -0400, Avishay Traeger wrote:
 Hello,
 I am trying to use kprobes to measure the latency of a function by
 instrumenting its call site.  Basically, I find the call instruction,
 and insert a kprobe with a pre-handler and post-handler at that point.
 The pre-handler measures the latency (reads the TSC counter).  The
 post-handler measures the latency again, and subtracts the value that
 was read in the pre-handler to compute the total latency of the called
 function.

This sounds ok...

 So to measure the latency of foo(), I basically want kprobes to do this:
 pre_handler();
 foo();
 post_handler();
 
 The problem is that the latencies that I am getting are consistently low
 (~10,000 cycles).  When I manually instrument the functions, the latency
 is about 20,000,000 cycles.  Clearly something is not right here.

Is foo() called from too many different places? If so, are you
interested with only the invocation of foo() from a specific callsite?
 
 Is this a known issue?  Instead of using the post-handler, I can try to
 add a kprobe to the following instruction with a pre-handler.  I was
 just curious if there was something fundamentally wrong with the
 approach I took, or maybe a bug that you should be made aware of.

I am not too sure... single-stepping a call instruction from a
different memory location (single-stepping out of line) requires some
fixups and kprobes handles such fixups just fine (see resume_execution()
in arch/arch/kernel/kprobes.c)

You could try a a couple of approaches for starters.

a. As you mention above, a kprobe on the function invocation and the
other on the instruction following the call; both need just pre_handlers. 

b.
- Insert a kprobe and a kretprobe on foo()
- The kprobe needs to have only a pre_handler that'll measure the latency
- A similar handler for the kretprobe handler can measure the latency
again and their difference will give you foo()'s latency.

b though will require you to do some housekeeping in case foo() is
reentrant to track which return instance corresponds to which call.

Ananth

PS: There was a thought of providing a facility to run a handler at
function entry even when just a kretprobe is used. Maybe we need to
relook at that; it'd have been useful in this case.
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/