On 04/17/2014 11:08 AM, E.Robbins wrote:
On 17/04/2014 16:40 PM Bill Williams wrote:
Oh. One other thing--if you're trying to analyze PE files on Linux,
that's not presently going to work. It might be possible, if you have a
Linux system with the necessary Windows headers present and you know of
a replacement for the debug SDK, to coerce a Linux build of Symtab to
speak PE.
Thanks. We are indeed trying to analyse PE files in Linux. I didn't realise
that this wasn't supported. When you say the debug SDK, do you mean some kind
of MS VS debugger?
No, there's a Debug Information Access (DIA) SDK that's available and
does a fair bit of symbol parsing for PE files that we then bake into
Symtab form. Its accessibility and redistributability have been somewhat
variable IIRC but if you have a non-Express version of Visual Studio,
you have full access to MS SDKs last I checked.
You could probably pull the text section out via objdump or
similar and stuff it into a fake ELF file.
We'll have to think about that, but it's certainly an option in the short term
I guess. We are mostly looking at malware so symbols are mostly useless, but we
probably will need to know about linkage, the entry point etc etc.
Yeah, if you're concerned with malware, you'll probably want a custom
CodeSource whether you're working on Linux or Windows.
I think I also have an
memory-backed CodeSource implementation floating around somewhere that
you could use as a starting point--as long as you can find the text
section and either don't care about symbols or can find them without
Windows headers, mocking up a CodeSource that speaks PE on Linux is a
simple matter of engineering.
What do you mean by a memory-backed CodeSource? We would be interested in
anything that can help, though obviously we may decide it's too big a task.
The two CodeSource implementations we distribute are Symtab and Symlite
based; both of these work with files on disk, as one does. For internal
testing purposes, it's convenient to be able to splat a blob of code
into memory and parse it, and the CodeSource implementation to do that
is what I mean by "memory-backed".
It occurs to me that what you actually may want is a Windows
implementation of SymLite--something that knows how to mmap in sections,
read section headers and the entry point, and optionally can give you a
lightweight representation of each symbol. That would then plug into the
existing SymLiteCodeSource and should work seamlessly.
It's engineering we haven't done because
parsing PE on Linux is not of much use to Dyninst without a *very*
full-featured cross-format Symtab backing it, such that we could rewrite
PE files on Linux.
Fair enough... we are somewhat at odds with the goals of dyninst because we are
doing static analysis and mostly use it for its control flow recovery which is
very good, and to some extent for reading symbols too.
The obvious answer then is to use windows. Can the windows version of dyninst
work over ELF binaries?
The Windows version doesn't speak ELF, though that's I think more
practical than getting Linux Dyninst to speak PE fully. I haven't
checked recently whether libelf/libdwarf build cleanly on Windows, but
if they do then that's a pretty straightforward project; we'd build with
libelf, libdwarf, and the dynElf/dynDwarf wrapper libraries, add the
various ELF-reading source files to the build, and toss a mechanism into
Symtab::openFile to check the file type and open things on the right
path. (Note, straightforward does not mean low-effort; we'd almost
certainly have to redesign some of the class structure so that the file
type could be a runtime decision rather than a build-time one. But it's
doable in principle.)
I think your best bet would be a cross-platform PE implementation of
SymLite, though.
Thanks a lot,
Ed
--
--bw
Bill Williams
Paradyn Project
b...@cs.wisc.edu
_______________________________________________
Dyninst-api mailing list
Dyninst-api@cs.wisc.edu
https://lists.cs.wisc.edu/mailman/listinfo/dyninst-api