Re: [DynInst_API:] Function Entry Point Recognition in Stripped Binaries

Shuai Wang Wed, 06 Jan 2016 12:30:40 -0800

Hello Bill,

Thank you for your prompt reply. Now I understand more about DynInst's this
functionality.  I will let you know if I have any other questions.


Thanks again.

Sincerely,
Shuai

On Wed, Jan 6, 2016 at 3:22 PM, Bill Williams <b...@cs.wisc.edu> wrote:

> On 01/06/2016 02:15 PM, Shuai Wang wrote:
>
> Hello Bill,
>
> Thank you for your information. I am wondering, besides the machine
> learning-based method, is there any other mechanism implemented in DynInst?
> For example, would you consider address 0x80102030 is a function entry
> point if a call instruction (*call 0x80102030*) can be found in the
> disassembled output?
>
> Naturally; we use the entry point of the binary, available function
> symbols, and internal control flow to generate function entry points as
> well. The machine learning approach is used to cover the gaps in the binary
> where authoritative information is missing.
>
> (And, of course, we also use the calls made by functions that are
> discovered through gap parsing to find further function entry points. The
> rationale here is derived from Nate Rosenblum's earlier gap parsing work;
> the intuition is that if code reachable from a likely (probability P) FEP F
> includes "call G", G is a FEP with probability Q >= P.)
>
> --bw
>
>
>
> Sincerely,
> Shuai
>
> On Wed, Jan 6, 2016 at 12:08 PM, Bill Williams <b...@cs.wisc.edu> wrote:
>
>> On 01/06/2016 10:43 AM, Shuai Wang wrote:
>>
>> Dear list,
>>
>> I am writing to ask how to use DynInst to recognize *function entry
>> points (memory addresses) in stripped binaries*.
>>
>>
>> I successfully installed the 32-bit DynInst 9.10, and I use a DynInst
>> script to iterate all the functions with the following commands to *dump
>> all the function entry point addresses from stripped binaries*.
>>
>>                      .......
>>                      vector<BPatch_module *> * modules =
>> appImage->getModules();
>>                      ......
>>                      vector<BPatch_function *> * funcs =
>> (*module_iter)->getProcedures();
>>                      vector<BPatch_function *>::iterator func_iter;
>>                      for(func_iter = funcs->begin(); func_iter !=
>> funcs->end(); ++func_iter) {
>>                           char functionName[1024];
>>                           (*func_iter)->getName(functionName, 1024);
>>                           cout << "-- Function : " << functionName << "
>> --" << endl;
>>                      ......
>>
>> I extract the function entry point addresses from the function names.
>>
>>
>> I test some LLVM compiler CoreUtil binaries with O2 optimization level.
>> And the precision/recall rate is general very good!  *Precision: 0.99;
>>  Recall: 0.91*
>>
>> According to this paper
>> <ftp://ftp.cs.wisc.edu/paradyn/papers/Williams15Dyninst.pdf>, Section
>> 6.2, on average DynInst can have over 0.97 precision, and 0.93 recall on
>> 32-bit ELF binaries. It is very consistent with my test! But still, I am
>> not sure whether I did everything correct.
>>
>> So here are my questions:
>>
>> 1. It seems that by leveraging machine learning method to recognize
>> functions, DynInst needs a training process before recognition, but I
>> didn't do any training  (although the results are pretty good), is there
>> anything in particular I have to do before using DynInst?
>>
>> The training step has been done once and the resulting model is baked
>> into the Dyninst code base. Your experimental setup should be correct.
>>
>> 2. If there is a "pre-trained" model installed in DynInst 9.10 already,
>> what kind of binaries does this model include?  For example, can I use it
>> to test 32-bit ELF binaries compiled from LLVM with O3? or ICC with O3?
>>
>> Dyninst was trained on the test set of binaries produced by the BAP group
>> at CMU, which includes binutils and coreutils binaries built with gcc and
>> icc at O0 through O3 (as well as Windows binaries, though that's of course
>> producing a separate model). I expect the model to generalize decently to
>> LLVM binaries, and we'd be interested to hear your results. Our initial
>> indications are that these models, applied to modern compiler versions, are
>> not terribly sensitive to the toolchain used.
>>
>> --bw
>>
>>
>> Am I clear enough? I appreciate if anyone can give me some help!
>>
>> Sincerely,
>> Shuai
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>> _______________________________________________
>> Dyninst-api mailing 
>> listdyninst-...@cs.wisc.eduhttps://lists.cs.wisc.edu/mailman/listinfo/dyninst-api
>>
>>
>>
>
>

_______________________________________________
Dyninst-api mailing list
Dyninst-api@cs.wisc.edu
https://lists.cs.wisc.edu/mailman/listinfo/dyninst-api

Re: [DynInst_API:] Function Entry Point Recognition in Stripped Binaries

Reply via email to