Hi,
I'm working with Devon Carraway's URIBL plugin and have been testing its
effectiveness in finding URI's using 6 million lines or so of email
traffic from a day in the life of our mail servers. My testing has shown
that the following line in the while ( $l = $transaction->body_getline )
loop within lookup_start is problematic:
# Dodge inserted-semicolon munging
$l =~ tr/;//d;
Unlike the other bits of "dodge this sort of munging" operations,
examining my test results and asking uncle google has not made it clear to
me what "inserted-semicolon munging" really is. Can anyone shed light on
how semicolons could be used to obfuscate URIs so the URIBL plugin can't
detect them? If I have an understanding of this, perhaps I can come up
with a safer alternative. I'll paste some of the output of my test script
to demonstrate the effect of tr/;//d. The 'Original result' is what we
find if we're using tr/;//d, the 'New result' is what we find without it.
<TD> <B>Required:</B><BR><BR><FONT size=2
face=Tahoma> .NET</FONT><BR><FONT size=2
face=Tahoma> Blah</FONT><BR><FONT size=2
face=Tahoma> Blah</FONT><BR></TD>
Results differ!
Original result: nbsp.net
New result: no match
Wichita, KS 67204, USA www.somesite.com =
Results differ!
Original result: nbspwww.somesite.com
New result: www.somesite.com
... you gets the picture.
-Jared