Mike,

Good point, however there is a problem.  What you have is HTML encoded UNICODE, and there are thousands upon thousands of these:  http://www.alanwood.net/unicode/unicode_samples_no.html , and there might be a good reason for this in multi-lingual mailings.  I don't think though that mail clients would be supporting this method because base64 encoding is a lot more efficient with the overhead than HTML encoding is.

You could potentially test for just ";&#" in order to find two HTML encoded characters of any type in succession, however there are valid uses where you are listing two symbols in succession and the FP's would probably come into play.  Such examples would probably be rare, so if you score the filter low in the first place, this wouldn't have a big impact.  Adding that three character string would also defeat the need for 62 of the BODY checks in that filter and save on some processing, I just don't know that it would be safe to do.

If someone with a decent mail volume and a decent number of clients that have foreign language customers would like to test this for FP's and let the list know, that would be valuable.  The filter would be the following:
-----Global.cfg-----
HTMLENCODE-TEST   filter        C:\IMail\Declude\Filters\HTMLEncode-Test.txt    x    0    0

-----HTMLEncode-Test.txt-----
BODY      0   CONTAINS   ;&#

-----$Default$.JunkMail-----
HTMLENCODE-TEST   COPYTO   [EMAIL PROTECTED]

I don't think my volume is large enough to get a feeling for the potential of FP's from this modification.  The existing filter though should hardly ever get an FP.

Matt



Mike K wrote:
May want to account for foreign languages also. I just received this spam
while I was adding your URL obfuscation filter.

Недорогие
звонки
зарубеж!

Mike


----- Original Message -----
From: "Matthew Bramble" <[EMAIL PROTECTED]>
To: <[EMAIL PROTECTED]>
Sent: Monday, September 15, 2003 12:40 PM
Subject: Re: [Declude.JunkMail] OBFUSCATION filter


  
Pete,

It's not redundant because the two by themselves only check for strings
of two, while the combination checks for strings with one of each in
succession.  This way, if they go back and forth between the two, it
will get caught as long as there is a "." or "@" between them, or as
long as it is URL encoding followed by HTML encoding.  I left out the
other way around because it was only a two character string, ";%" and
wanted to protect from FP's.

I do appreciate the feedback though...I do of course make mistakes.

Matt

Pete McNeil wrote:

    
Matt,

It appears that your coding for a combination of http & url encoding
in urls is redundant since you capture both types individually. It's a
small optimization, but worth mentioning.

_M

At 07:46 PM 9/14/2003 -0400, you wrote:

      
I've posted a newer version of the OBFUSCATION filter on my site.
This contains the removal of the attachment thing and also the
removal of 6 (of over 100) tests in order to be more forgiving, sans
the PayPal issue.


        
http://208.7.179.20/decludefilters/obfuscation/obfuscation_09-14-2003c.txt
  
If you find any false positives with this besides the Ticketmaster
one that I've already counterbalanced, please let me know.  I would
imagine that posting to this group would be better than PM's unless
others mind having discussion here.  That way everyone would know
about any issues ASAP.

Thanks,

Matt
        

Reply via email to