Joxean's stuff is similar to Nilsimsa or (as he mentions) ssdeep, in that
it'll find mostly similar instances of the same underlying data, assuming
only small bit-level changes (such as from version shifts).  It's obviously
not a magic unpacker of any arbitrary virus, though.

His stuff, by its very nature, is a fuzzy similarity metric, meaning if you
run it on small chunks of a file sequentially you can get fuzzy diff.

Detecting multiple files of the same file type is actually a different
problem, and sort of an interesting one.  The best thing to do here is take
a large number of samples that *are* your file type, and then a large number
of samples that *are not* your file type (and are not the same other
not-the-right-type), and look for either strings or statistical patterns
that show up in the member set and not in the alternate.  These fingerprints
are then sought in other samples.

It's not terribly common that you actually need to do this though.  Browsers
need to do this a bit because MIME types are wonky.  They do this
optimization by hand though.


On Tue, Jan 5, 2010 at 3:56 PM, T Biehn <tbi...@gmail.com> wrote:

> I can see what you're saying, it could be useful for finding
> differences in different versions of the same binary but from what I
> can see Joxean's app is meant to group files of the same 'type,' not
> provide 'diff' capabilities.
>
> -Travis
>
> On Tue, Jan 5, 2010 at 9:51 AM, Dan Kaminsky <d...@doxpara.com> wrote:
> > I looked into a fair amount of this sort of normalization back when I was
> > playing with dotplots.  The idea was to upgrade from simple Levenshtein
> > string comparison (with no knowledge of variable length x86 instructions,
> > pointers that shift from compile to compile, etc) to something with at
> least
> > some domain specific knowledge.  What I found, somewhat surprisingly, was
> > that dumb string comparison was more than enough.  In fact, when I
> compared
> > pre-patch and post-patch builds, it was easy to directly see when content
> > was added, removed, shifted in location, etc.  Joxean's going to have
> much
> > the same result -- as basic as his similarity metric is, he'll get the
> broad
> > strokes just fine.
> >
> > Ultimately the best approach is to build a graph of how functions
> interact
> > and measure graph isomorphism, but of course Halvar figured that out
> years
> > ago :)
> >
> > On Tue, Jan 5, 2010 at 3:41 PM, T Biehn <tbi...@gmail.com> wrote:
> >>
> >> Hmm,
> >> Wouldn't it be more useful to the sec community to have a algorithm
> >> that abstracts at the -interpreted- content level? That is when
> >> analyzing binaries I wouldn't think that this would classify two with
> >> near identical functionality together, even though it is removing a
> >> significant chunk of information during the hash pass.
> >>
> >> I would largely assume that your algorithm, as is, works best on
> >> uncompressed bitmaps. Is there something I'm missing?
> >>
> >> -Travis
> >>
> >> On Sun, Jan 3, 2010 at 6:37 AM, Joxean Koret <joxeanko...@yahoo.es>
> wrote:
> >> > Hi all,
> >> >
> >> > I'm happy to announce the very first public release of the open source
> >> > project DeepToad, a tool for computing fuzzy hashes from files.
> >> >
> >> > DeepToad can generate signatures, clusterize files and/or directories
> >> > and compare them. It's inspired in the very good tool ssdeep [1] and,
> in
> >> > fact, both projects are very similar.
> >> >
> >> > The complete project is written in pure python and is distributed
> under
> >> > the LGPL license [2].
> >> >
> >> > Links:
> >> > Project's Web Page http://code.google.com/p/deeptoad/
> >> > Download Web Page http://code.google.com/p/deeptoad/downloads/list
> >> > Wiki http://code.google.com/p/deeptoad/w/list
> >> >
> >> > References:
> >> > [1] http://ssdeep.sourceforge.net/
> >> > [2] http://www.gnu.org/licenses/lgpl.html
> >> >
> >> > Regards && Happy new year!
> >> > Joxean Koret
> >> >
> >> >
> >> > _______________________________________________
> >> > Full-Disclosure - We believe in it.
> >> > Charter: http://lists.grok.org.uk/full-disclosure-charter.html
> >> > Hosted and sponsored by Secunia - http://secunia.com/
> >> >
> >>
> >>
> >>
> >> --
> >> FD1D E574 6CAB 2FAF 2921  F22E B8B7 9D0D 99FF A73C
> >>
> http://pgp.mit.edu:11371/pks/lookup?search=tbiehn&op=index&fingerprint=on
> >> http://pastebin.com/f6fd606da
> >>
> >> _______________________________________________
> >> Full-Disclosure - We believe in it.
> >> Charter: http://lists.grok.org.uk/full-disclosure-charter.html
> >> Hosted and sponsored by Secunia - http://secunia.com/
> >
> >
>
>
>
> --
> FD1D E574 6CAB 2FAF 2921  F22E B8B7 9D0D 99FF A73C
> http://pgp.mit.edu:11371/pks/lookup?search=tbiehn&op=index&fingerprint=on
> http://pastebin.com/f6fd606da
>
_______________________________________________
Full-Disclosure - We believe in it.
Charter: http://lists.grok.org.uk/full-disclosure-charter.html
Hosted and sponsored by Secunia - http://secunia.com/

Reply via email to