Ah.. that help a lot Dave.

P.S. Excuse my typos. Touched and not typed.
On Mar 25, 2013 5:39 PM, "David Raynor" <dray...@sourcefire.com> wrote:

> On Sat, Mar 23, 2013 at 3:34 PM, Kaushik Vaidyanathan <
> kvaid...@andrew.cmu.edu> wrote:
>
> > Hi Matt
> >
> > Thanks for your detailed explanation on how signature gets stored and
> > interpreted.
> >
> > I was looking up the codes in libclamav to see what data formats get used
> > for string compare. Some backtracking from cli_bm_scanbuff took me to
> str.c
> > where I see there is a function" cli_hex2str", which if I understand
> > correctly maps two hexs to one character (unsigned char). Would it fair
> to
> > speculate that this function is used by the clamav engine to map two hexs
> > read from a signature or scanned file into one char for string matching
> > purposes?
> >
> > Thank you..
> >
> >
> > On Sat, Mar 23, 2013 at 11:02 AM, Matt Olney <mol...@sourcefire.com>
> > wrote:
> >
> > > Well....data is data.  There is no difference (from a storage
> > perspective)
> > > from an executable with an "inc ecx" instruction or a text document
> with
> > an
> > > "A".  Both are represented by the value 0x41.  So from Clam's
> > perspective,
> > > a signature matching a single A would be identical to a signature that
> > > detected a single "inc ecx" instruction.  Both would look for 41.
> > >
> > > In short your statement "some files are hex and some are
> character-based"
> > > isn't really accurate.  At the risk of painting with a broad brush, I
> > would
> > > say that all files are stored as a series of values, a series of bytes.
> > >  How you display them is different.  When I used 010 Editor to view a
> > file
> > > as hex, I get a set of ascii-hex representations.  When I look at a
> file
> > > with a web-browser I get ascii text.  But underlying all of that is the
> > > same idea, a set of bytes.  And that is how ClamAV treats all files.
> > >
> > > A signature with a 41 in it would be converted in memory to look for
> > 0x41,
> > > a single byte of value 0x41.  A signature written like that would
> detect
> > an
> > > executable or pdf or a flash or anything that has 0x41 in the data.
> > >
> > > Hope that answers your question.
> > >
> > > Matt
> > >
> > >
> > > On Fri, Mar 22, 2013 at 8:46 PM, Kaushik Vaidyanathan <
> > > kvaid...@andrew.cmu.edu> wrote:
> > >
> > > > Hi
> > > >
> > > > I have a basic question. Most body-based signatures are hex
> based(lets
> > > > focus on fixed string signatures alone for simplicity), whereas some
> of
> > > the
> > > > files are hex(EXE) or character-based(HTML).
> > > >
> > > > In the code I see unsigned chars used predominantly to represent
> > patterns
> > > > and file contents. At the very core, do the string matching
> algorithms,
> > > > mainly extended Boyer Moore, I would like to understand how the
> > datatypes
> > > > gets manipulated.
> > > >
> > > > 1) Do the character based files get translated to hex to compare with
> > > body
> > > > based signatures?
> > > >
> > > > 2) Does the signature get treated as a string of chars?
> > > > If yes,
> > > > Does a toy signature "fe" gets treated as two chars(8 bits each) for
> > "f"
> > > > and "e" (or)
> > > > Does the code read the signature "fe" and maps into one character
> based
> > > on
> > > > the ASCII table (for example)?
> > > >
> > > > Thank you..
> > > > _______________________________________________
> > > > http://lurker.clamav.net/list/clamav-devel.html
> > > > Please submit your patches to our Bugzilla: http://bugs.clamav.net
> > > >
> > > _______________________________________________
> > > http://lurker.clamav.net/list/clamav-devel.html
> > > Please submit your patches to our Bugzilla: http://bugs.clamav.net
> > >
> > _______________________________________________
> > http://lurker.clamav.net/list/clamav-devel.html
> > Please submit your patches to our Bugzilla: http://bugs.clamav.net
> >
>
> Read from signature, yes. Read from file, no. To quickly compare bytes it
> is better to do it using the in-file binary representation. It is more
> direct to say that cli_hex2str() is converting human-readable
> representation of a hexadecimal number into the binary equivalent. For any
> byte pattern to match, the signature-format equivalent will take twice as
> many bytes as the raw binary value.
>
> Example: "Hex" in ASCII
> Actual data is 3 bytes long. 1st byte: 0x48. 2nd byte: 0x65. 3rd byte: 0x78
> Signature-format equivalent is 6 bytes long, one for each hex digit.
>
> This is where the name of the function came from. Input and output are both
> char arrays (i.e. strings). The function takes in the "hex"-format version
> of the content [486578], and returns the content in a usable string format
> [Hex]. Hence, from "hex" to string.
>
> Dave R.
>
> --
> ---
> Dave Raynor
> Sourcefire Vulnerability Research Team
> dray...@sourcefire.com
> _______________________________________________
> http://lurker.clamav.net/list/clamav-devel.html
> Please submit your patches to our Bugzilla: http://bugs.clamav.net
>
_______________________________________________
http://lurker.clamav.net/list/clamav-devel.html
Please submit your patches to our Bugzilla: http://bugs.clamav.net

Reply via email to