Hi all -

I wrote to the list a couple of times before talking about using a
filtering engine (that fits in the CPU cache) before running the
perfect-match scanner. We can easily achieve that by using a bloom
filter and cheap hash functions. Overall, in an apples to apples
comparison with Clamav, we got performance improvements of 6-8x in
Windows executables, and much more in HTML and random data.

Here's a paper draft that discusses our approach with tons of
performance data (and it's an easy read):

http://raconsoft.com/products/opensource/hash-av.pdf

http://raconsoft.com/products/opensource contains a bloom-clean.tar.gz
file (which includes the code and a README on how to run it -it
doesn't however contain the virus database or sample test
executables-), and the directory structure for the code.

This approach more or less croaks for polymorphic viruses where we
have a signature prefix of 2 bytes followed by a regexp, and in that
case it directly calls Clamav. That's why I'm interested in writing an
emulation engine, and we can get much better numbers then. We also get
faster as CPUs become more powerful because HashAV relies more on
improvements in CPU power (compared to memory access time). All in
all, it's 700 lines of code, and shouldn't be too hard to read.

What do ClamAV developers think? What would be the pros and cons of
running HashAV?

Thanks,

Ozgun.
_______________________________________________
http://lists.clamav.net/cgi-bin/mailman/listinfo/clamav-devel

Reply via email to