On 9/19/11 8:46 AM, Michael Orlitzky wrote:

A hostname cannot be all digits and except when the IP is used there
will be a TLD, so if you see a pattern such as

   http:// 123456789/ cgi-bin/innocent_code.pl

(Ignore the spaces they are there to let this post slip by most antispam
detection) then you can surmise it is an attempt at obfuscation.

I don't get it, what's the pattern we're looking for? An IP address is a
number. Any way you specify it is fine. 123456789 is no more obfuscated
than whatever it would be if you converted it to dotted quad. They both
represent the same number.

If you're trying to match a text pattern against an integer, you're
doing it wrong.


FWIW, here is a regex that will match the "suspicious" pattern example:

"http:\/\/[0-9]{1,9}\/cgi-bin\/.*\.pl"

How valuable it may be, I don't know. It can turn up false positives but I kind of have the philosophy that if a real post is indistinguishable from spam there is probably something wrong with the post. Evalluating this pattern in the context of spam volume is a key part of the go, no go decision to use it. In my milter (J-Chkmail) I give these experiments a very low weight so that they cannot easily condemn a message. It allows me to evaluate messages in a live system with little concern it will trigger a FP.

dp
_______________________________________________
Help us build a comprehensive ClamAV guide: visit http://wiki.clamav.net
http://www.clamav.net/support/ml

Reply via email to