Re: [DynInst_API:] Measuring Dyninst Dynamic Instrumentation Overhead

2015-02-19 Thread Bill Williams

On 02/19/2015 10:25 AM, budchan chao wrote:

Hi All,

If I understand it correctly Dyninst uses ptrace to connect and modify
the mutatee. I want to check how much overhead it causes at runtime to
mutate an instrumentation point. Also I am interested in getting
overhead of a trampoline at runtime. Are there any existing benchmarks
for these I can run to get these number? If that's not the case I would
really appreciate any tips for coming up with these benchmarks being new
to the project.


Obligatory disclaimer: Dyninst overhead is highly variable depending on 
the context in which you're using it and your skill at writing an 
efficient mutator. I'm trying to give good general information below; if 
you can share a bit about the environment you're working in, I (and the 
rest of the list) can provide more focused advice.


We've generally used SPECINT/SPECFP as our baseline set of mutatees for 
overhead testing. Precise benchmarking of various components of our 
instrumentation overhead can require some tweaking of Dyninst internals; 
we haven't released any standard benchmarking mutators (that I'm aware 
of) recently.


You can insert null/no-op instrumentation at your desired 
instrumentation points and get a reasonable benchmark of the 
springboard/relocation overhead associated with instrumenting those points.



I think for trampoline overhead one I can time call loop for an empty
function (inlined) with an entry instrumentation. For the first one I
think measuring elapsed time between processAttach and continueExecution
would do the trick. Am I correct? Just want to make sure I am thinking
correctly on this.

Calling an empty function with entry instrumentation is going to give 
you skewed relative overhead and may or may not give you useful absolute 
overhead. Relative overhead will, to a first approximation, be 
proportional to the fraction of new instructions added, and most 
functions you'd want to instrument in real code are not actually empty. 
An empty function is also going to have potentially very different cache 
behavior from a real-world function, and the perturbations that 
instrumentation causes there will have little to do with the sorts of 
perturbations we see in real applications.


There should be some measure of parsing time that's amortized into the 
first instrumentation operation on a given DSO in a process. I don't 
know how precisely you want to separate parsing, code generation, and 
the actual mechanics of inserting a generated binary blob, but what 
you're proposing to measure between attach and continue is going to 
contain some of each of those.



Also I was wondering if there was way to do the dynamic instrumentation
in-band if that makes sense. (Like using a separate thread in the same
process so that there is no need to have a separate mutator process to
do it.)


There have been various projects in the group over the years that do 
in-band (or first-party, as we refer to it) instrumentation. As far as I 
know, none of them have taken a separate thread approach. There's also 
Dyninst's binary rewriting mode, where the 
parsing/codegen/instrumentation process occurs once up-front and then 
you run the instrumented binary on its own.



Regards
Chan


___
Dyninst-api mailing list
Dyninst-api@cs.wisc.edu
https://lists.cs.wisc.edu/mailman/listinfo/dyninst-api




--
--bw

Bill Williams
Paradyn Project
b...@cs.wisc.edu
___
Dyninst-api mailing list
Dyninst-api@cs.wisc.edu
https://lists.cs.wisc.edu/mailman/listinfo/dyninst-api


Re: [DynInst_API:] Measuring Dyninst Dynamic Instrumentation Overhead

2015-02-19 Thread budchan chao
Thanks for the reply. Please find some responses inline. 

 On Thursday, 19 February 2015 1:50 PM, Bill Williams b...@cs.wisc.edu 
wrote:
   

 On 02/19/2015 10:25 AM, budchan chao wrote:
 Hi All,

 If I understand it correctly Dyninst uses ptrace to connect and modify
 the mutatee. I want to check how much overhead it causes at runtime to
 mutate an instrumentation point. Also I am interested in getting
 overhead of a trampoline at runtime. Are there any existing benchmarks
 for these I can run to get these number? If that's not the case I would
 really appreciate any tips for coming up with these benchmarks being new
 to the project.

Obligatory disclaimer: Dyninst overhead is highly variable depending on 
the context in which you're using it and your skill at writing an 
efficient mutator. I'm trying to give good general information below; if 
you can share a bit about the environment you're working in, I (and the 
rest of the list) can provide more focused advice.
It is x86 ELF binaries (GCC) that I am working with. 

We've generally used SPECINT/SPECFP as our baseline set of mutatees for 
overhead testing. Precise benchmarking of various components of our 
instrumentation overhead can require some tweaking of Dyninst internals; 
we haven't released any standard benchmarking mutators (that I'm aware 
of) recently.
I have several SPECINT (h264 etc.) applications that I am planning to use down 
the line for benchmarking with DynInst to get an idea on typical overheads 
involved.Any suggestions for good set of benchmark applications which cover 
varied runtime behaviors?
You can insert null/no-op instrumentation at your desired 
instrumentation points and get a reasonable benchmark of the 
springboard/relocation overhead associated with instrumenting those points.
I will try that

 I think for trampoline overhead one I can time call loop for an empty
 function (inlined) with an entry instrumentation. For the first one I
 think measuring elapsed time between processAttach and continueExecution
 would do the trick. Am I correct? Just want to make sure I am thinking
 correctly on this.

Calling an empty function with entry instrumentation is going to give 
you skewed relative overhead and may or may not give you useful absolute 
overhead. Relative overhead will, to a first approximation, be 
proportional to the fraction of new instructions added, and most 
functions you'd want to instrument in real code are not actually empty. 
An empty function is also going to have potentially very different cache 
behavior from a real-world function, and the perturbations that 
instrumentation causes there will have little to do with the sorts of 
perturbations we see in real applications.
This indeed make sense. 

There should be some measure of parsing time that's amortized into the 
first instrumentation operation on a given DSO in a process. I don't 
know how precisely you want to separate parsing, code generation, and 
the actual mechanics of inserting a generated binary blob, but what 
you're proposing to measure between attach and continue is going to 
contain some of each of those.
What if I insert snippet and then somehow remove it (didn't yet see the 
APIcalls related removal of snippets at runtime) and re-insert it. Would it 
cachethe generated code and just reuse it the second time around. In that caseI 
could potentially time that second insertion to leave out the code 
generationoverhead?

 Also I was wondering if there was way to do the dynamic instrumentation
 in-band if that makes sense. (Like using a separate thread in the same
 process so that there is no need to have a separate mutator process to
 do it.)

There have been various projects in the group over the years that do 
in-band (or first-party, as we refer to it) instrumentation. As far as I 
know, none of them have taken a separate thread approach. There's also 
Dyninst's binary rewriting mode, where the 
parsing/codegen/instrumentation process occurs once up-front and then 
you run the instrumented binary on its own.
Interesting..


 Regards
 Chan


 ___
 Dyninst-api mailing list
 Dyninst-api@cs.wisc.edu
 https://lists.cs.wisc.edu/mailman/listinfo/dyninst-api



-- 
--bw

Bill Williams
Paradyn Project
b...@cs.wisc.edu


   ___
Dyninst-api mailing list
Dyninst-api@cs.wisc.edu
https://lists.cs.wisc.edu/mailman/listinfo/dyninst-api

[DynInst_API:] Measuring Dyninst Dynamic Instrumentation Overhead

2015-02-19 Thread budchan chao
Hi All,
If I understand it correctly Dyninst uses ptrace to connect and modify the 
mutatee. I want to check how much overhead it causes at runtime to mutate an 
instrumentation point. Also I am interested in getting overhead of a trampoline 
at runtime. Are there any existing benchmarks for these I can run to get these 
number? If that's not the case I would really appreciate any tips for coming up 
with these benchmarks being new to the project. 
I think for trampoline overhead one I can time call loop for an empty function 
(inlined) with an entry instrumentation. For the first one I think measuring 
elapsed time between processAttach and continueExecution would do the trick. Am 
I correct? Just want to make sure I am thinking correctly on this.
Also I was wondering if there was way to do the dynamic instrumentation 
in-band if that makes sense. (Like using a separate thread in the same 
process so that there is no need to have a separate mutator process to do it.) 
RegardsChan___
Dyninst-api mailing list
Dyninst-api@cs.wisc.edu
https://lists.cs.wisc.edu/mailman/listinfo/dyninst-api