Hi all - I wrote to the list a couple of times before talking about using a filtering engine (that fits in the CPU cache) before running the perfect-match scanner. We can easily achieve that by using a bloom filter and cheap hash functions. Overall, in an apples to apples comparison with Clamav, we got performance improvements of 6-8x in Windows executables, and much more in HTML and random data.
Here's a paper draft that discusses our approach with tons of performance data (and it's an easy read): http://raconsoft.com/products/opensource/hash-av.pdf http://raconsoft.com/products/opensource contains a bloom-clean.tar.gz file (which includes the code and a README on how to run it -it doesn't however contain the virus database or sample test executables-), and the directory structure for the code. This approach more or less croaks for polymorphic viruses where we have a signature prefix of 2 bytes followed by a regexp, and in that case it directly calls Clamav. That's why I'm interested in writing an emulation engine, and we can get much better numbers then. We also get faster as CPUs become more powerful because HashAV relies more on improvements in CPU power (compared to memory access time). All in all, it's 700 lines of code, and shouldn't be too hard to read. What do ClamAV developers think? What would be the pros and cons of running HashAV? Thanks, Ozgun. _______________________________________________ http://lists.clamav.net/cgi-bin/mailman/listinfo/clamav-devel
