While the NONENGLISH test is pretty effective (99%+) it does trip up occasionally. Our company does some business internationally so it's a tough area.
Unfortunately for me the NONENGLISH test has false positives on the company that owns me (Australian company) and some of my own test mail as well as occasional others. Especially UTF-8 encoded stuff. So I can only assign a low-moderate weight to it. Discussion on NONENGLISH: http://www.mail-archive.com/[EMAIL PROTECTED]/msg18854.html I went round and round on this Chinese language mail (and Korean too), Message Sniffer wasn't effective, text filters weren't effective (no English text). Spamdomains occasionally hurt some legit Chinese English language mail and couldn't be assigned a punishment weight. I then tried to check for GB2312 encoding in the header to try to punish the Chinese mail. This is not a great indicator either. The English ASCII characters are a subset of GB2312. So a computer with a character set of GB2312 can and does send me a message in English yet has a header code of GB2312. Looking at GB2312 character set, it uses two bytes to store the character information. So someone else on this list created filters to check for Chinese. Since that person is the author, I don't feel comfortable sharing his work on these filters. Maybe he'll step out and volunteer it. Basically it looks for certain high bit characters that are likely to occur in Chinese and certain character sets. It's compounded with some END statements to minimize false positives. It's as near to 100% effective as a filter can be, and I am able to assign it a high punishment weight. Lastly it's a filter not a external test. In my original e-mail I said I have an external program that looks for a subject line that is all caps. I consider this to be a potential indicator of Nigerian/419 e-mails and use it in a filter I am working on. Scott Fisher Director of IT Farm Progress Companies >>> [EMAIL PROTECTED] 08/24/04 05:23PM >>> I would be curious to hear on this as well. It's my understanding that the non-english test in declude should catch this (chinese in the subject)? Why the need for an external test? Darrell ------------------------------------------------------------------------ Check out http://www.invariantsystems.com for utilities for Declude And Imail. IMail/Declude Overflow Queue Monitoring, MRTG Integration, and Log Parsers. Keith Johnson writes: > Scott Fisher, > I heard you mention once that you made a filter to catch Chinese characters > in the subject, we have a few customers that get nailed by these often. Was > wondering if you could share your thoughts. Thanks, > > Keith > > -----Original Message----- > From: [EMAIL PROTECTED] on behalf of Scott Fisher > Sent: Tue 8/24/2004 12:12 AM > To: [EMAIL PROTECTED] > Cc: > Subject: [Declude.JunkMail] External Test for Subject is Upper Case > > > > I've made an external test to test if the Subject is all upper case (or > punctuation). > If anyone is interested, let me know and I'll e-mail you a copy. > --- > [This E-mail was scanned for viruses by Declude Virus > (http://www.declude.com)] > > --- > This E-mail came from the Declude.JunkMail mailing list. To > unsubscribe, just send an E-mail to [EMAIL PROTECTED], and > type "unsubscribe Declude.JunkMail". The archives can be found > at http://www.mail-archive.com. > > --- [This E-mail was scanned for viruses by Declude Virus (http://www.declude.com)] --- This E-mail came from the Declude.JunkMail mailing list. To unsubscribe, just send an E-mail to [EMAIL PROTECTED], and type "unsubscribe Declude.JunkMail". The archives can be found at http://www.mail-archive.com. --- [This E-mail was scanned for viruses by Declude Virus (http://www.declude.com)] --- This E-mail came from the Declude.JunkMail mailing list. To unsubscribe, just send an E-mail to [EMAIL PROTECTED], and type "unsubscribe Declude.JunkMail". The archives can be found at http://www.mail-archive.com.