Mike, Good point, however there is a problem. What you have is HTML encoded UNICODE, and there are thousands upon thousands of these: http://www.alanwood.net/unicode/unicode_samples_no.html , and there might be a good reason for this in multi-lingual mailings. I don't think though that mail clients would be supporting this method because base64 encoding is a lot more efficient with the overhead than HTML encoding is. You could potentially test for just ";&#" in order to find two HTML encoded characters of any type in succession, however there are valid uses where you are listing two symbols in succession and the FP's would probably come into play. Such examples would probably be rare, so if you score the filter low in the first place, this wouldn't have a big impact. Adding that three character string would also defeat the need for 62 of the BODY checks in that filter and save on some processing, I just don't know that it would be safe to do. If someone with a decent mail volume and a decent number of clients that have foreign language customers would like to test this for FP's and let the list know, that would be valuable. The filter would be the following: -----Global.cfg----- I don't think my volume is large enough to get a feeling for the potential of FP's from this modification. The existing filter though should hardly ever get an FP. Matt Mike K wrote: May want to account for foreign languages also. I just received this spam while I was adding your URL obfuscation filter.Недорогие звонки зарубеж! Mike ----- Original Message ----- From: "Matthew Bramble" <[EMAIL PROTECTED]> To: <[EMAIL PROTECTED]> Sent: Monday, September 15, 2003 12:40 PM Subject: Re: [Declude.JunkMail] OBFUSCATION filterPete, It's not redundant because the two by themselves only check for strings of two, while the combination checks for strings with one of each in succession. This way, if they go back and forth between the two, it will get caught as long as there is a "." or "@" between them, or as long as it is URL encoding followed by HTML encoding. I left out the other way around because it was only a two character string, ";%" and wanted to protect from FP's. I do appreciate the feedback though...I do of course make mistakes. Matt Pete McNeil wrote:Matt, It appears that your coding for a combination of http & url encoding in urls is redundant since you capture both types individually. It's a small optimization, but worth mentioning. _M At 07:46 PM 9/14/2003 -0400, you wrote:I've posted a newer version of the OBFUSCATION filter on my site. This contains the removal of the attachment thing and also the removal of 6 (of over 100) tests in order to be more forgiving, sans the PayPal issue.http://208.7.179.20/decludefilters/obfuscation/obfuscation_09-14-2003c.txtIf you find any false positives with this besides the Ticketmaster one that I've already counterbalanced, please let me know. I would imagine that posting to this group would be better than PM's unless others mind having discussion here. That way everyone would know about any issues ASAP. Thanks, Matt |
- Re: [Declude.JunkMail] OBFUSCATION filter Bill Landry
- Re: [Declude.JunkMail] OBFUSCATION filter Frederick Samarelli
- Re: [Declude.JunkMail] OBFUSCATION filter Matthew Bramble
- Re: [Declude.JunkMail] OBFUSCATION filter Pete McNeil
- Re: [Declude.JunkMail] OBFUSCATION filter Pete McNeil
- Re: [Declude.JunkMail] OBFUSCATION filter Matthew Bramble
- RE: [Declude.JunkMail] OBFUSCATION filter Pete - Madscientist
- Re: [Declude.JunkMail] OBFUSCATION filter Mike K
- Re: [Declude.JunkMail] OBFUSCATION fi... Mike K
- Re: [Declude.JunkMail] OBFUSCATI... Matthew Bramble
- Matthew Bramble