Ah.. that help a lot Dave. P.S. Excuse my typos. Touched and not typed. On Mar 25, 2013 5:39 PM, "David Raynor" <dray...@sourcefire.com> wrote:
> On Sat, Mar 23, 2013 at 3:34 PM, Kaushik Vaidyanathan < > kvaid...@andrew.cmu.edu> wrote: > > > Hi Matt > > > > Thanks for your detailed explanation on how signature gets stored and > > interpreted. > > > > I was looking up the codes in libclamav to see what data formats get used > > for string compare. Some backtracking from cli_bm_scanbuff took me to > str.c > > where I see there is a function" cli_hex2str", which if I understand > > correctly maps two hexs to one character (unsigned char). Would it fair > to > > speculate that this function is used by the clamav engine to map two hexs > > read from a signature or scanned file into one char for string matching > > purposes? > > > > Thank you.. > > > > > > On Sat, Mar 23, 2013 at 11:02 AM, Matt Olney <mol...@sourcefire.com> > > wrote: > > > > > Well....data is data. There is no difference (from a storage > > perspective) > > > from an executable with an "inc ecx" instruction or a text document > with > > an > > > "A". Both are represented by the value 0x41. So from Clam's > > perspective, > > > a signature matching a single A would be identical to a signature that > > > detected a single "inc ecx" instruction. Both would look for 41. > > > > > > In short your statement "some files are hex and some are > character-based" > > > isn't really accurate. At the risk of painting with a broad brush, I > > would > > > say that all files are stored as a series of values, a series of bytes. > > > How you display them is different. When I used 010 Editor to view a > > file > > > as hex, I get a set of ascii-hex representations. When I look at a > file > > > with a web-browser I get ascii text. But underlying all of that is the > > > same idea, a set of bytes. And that is how ClamAV treats all files. > > > > > > A signature with a 41 in it would be converted in memory to look for > > 0x41, > > > a single byte of value 0x41. A signature written like that would > detect > > an > > > executable or pdf or a flash or anything that has 0x41 in the data. > > > > > > Hope that answers your question. > > > > > > Matt > > > > > > > > > On Fri, Mar 22, 2013 at 8:46 PM, Kaushik Vaidyanathan < > > > kvaid...@andrew.cmu.edu> wrote: > > > > > > > Hi > > > > > > > > I have a basic question. Most body-based signatures are hex > based(lets > > > > focus on fixed string signatures alone for simplicity), whereas some > of > > > the > > > > files are hex(EXE) or character-based(HTML). > > > > > > > > In the code I see unsigned chars used predominantly to represent > > patterns > > > > and file contents. At the very core, do the string matching > algorithms, > > > > mainly extended Boyer Moore, I would like to understand how the > > datatypes > > > > gets manipulated. > > > > > > > > 1) Do the character based files get translated to hex to compare with > > > body > > > > based signatures? > > > > > > > > 2) Does the signature get treated as a string of chars? > > > > If yes, > > > > Does a toy signature "fe" gets treated as two chars(8 bits each) for > > "f" > > > > and "e" (or) > > > > Does the code read the signature "fe" and maps into one character > based > > > on > > > > the ASCII table (for example)? > > > > > > > > Thank you.. > > > > _______________________________________________ > > > > http://lurker.clamav.net/list/clamav-devel.html > > > > Please submit your patches to our Bugzilla: http://bugs.clamav.net > > > > > > > _______________________________________________ > > > http://lurker.clamav.net/list/clamav-devel.html > > > Please submit your patches to our Bugzilla: http://bugs.clamav.net > > > > > _______________________________________________ > > http://lurker.clamav.net/list/clamav-devel.html > > Please submit your patches to our Bugzilla: http://bugs.clamav.net > > > > Read from signature, yes. Read from file, no. To quickly compare bytes it > is better to do it using the in-file binary representation. It is more > direct to say that cli_hex2str() is converting human-readable > representation of a hexadecimal number into the binary equivalent. For any > byte pattern to match, the signature-format equivalent will take twice as > many bytes as the raw binary value. > > Example: "Hex" in ASCII > Actual data is 3 bytes long. 1st byte: 0x48. 2nd byte: 0x65. 3rd byte: 0x78 > Signature-format equivalent is 6 bytes long, one for each hex digit. > > This is where the name of the function came from. Input and output are both > char arrays (i.e. strings). The function takes in the "hex"-format version > of the content [486578], and returns the content in a usable string format > [Hex]. Hence, from "hex" to string. > > Dave R. > > -- > --- > Dave Raynor > Sourcefire Vulnerability Research Team > dray...@sourcefire.com > _______________________________________________ > http://lurker.clamav.net/list/clamav-devel.html > Please submit your patches to our Bugzilla: http://bugs.clamav.net > _______________________________________________ http://lurker.clamav.net/list/clamav-devel.html Please submit your patches to our Bugzilla: http://bugs.clamav.net