While the NONENGLISH test is pretty effective (99%+)  it does trip up occasionally. 
Our company does some business internationally so it's a tough area.

Unfortunately for me the NONENGLISH test has false positives on the company that owns 
me (Australian company) and some of my own test mail as well as occasional others. 
Especially UTF-8 encoded stuff. So I can only assign a low-moderate weight to it.
Discussion on NONENGLISH:
http://www.mail-archive.com/[EMAIL PROTECTED]/msg18854.html

I went round and round on this Chinese language mail (and Korean too), Message Sniffer 
wasn't effective, text filters weren't effective (no English text). Spamdomains 
occasionally hurt some legit Chinese English language mail and couldn't be assigned a 
punishment weight.

I then tried to check for GB2312 encoding in the header to try to punish the Chinese 
mail. This is not a great indicator either. The English ASCII characters are a subset 
of GB2312. So a computer with a character set of GB2312 can and does send me a message 
in English yet has a header code of GB2312.

Looking at GB2312 character set, it uses two bytes to store the character information. 

So someone else on this list created filters to check for Chinese. Since that person 
is the author, I don't feel comfortable sharing his work on these filters. Maybe he'll 
step out and volunteer it.

Basically it looks for certain high bit characters that are likely to occur in Chinese 
and certain character sets. It's compounded with some END statements to minimize false 
positives. It's as near to 100% effective as a filter can be, and I am able to assign 
it a high punishment weight.

Lastly it's a filter not a external test.
In my original e-mail I said I have an external program that looks for a subject line 
that is all caps. I consider this to be a potential indicator of Nigerian/419 e-mails 
and use it in a filter I am working on.

Scott Fisher
Director of IT
Farm Progress Companies

>>> [EMAIL PROTECTED] 08/24/04 05:23PM >>>
I would be curious to hear on this as well.  It's my understanding that the 
non-english test in declude should catch this (chinese in the subject)?  Why 
the need for an external test? 

Darrell 

 ------------------------------------------------------------------------
Check out http://www.invariantsystems.com for utilities for Declude And 
Imail.  IMail/Declude Overflow Queue Monitoring, MRTG Integration, and Log 
Parsers. 


Keith Johnson writes: 

> Scott Fisher,
>         I heard you mention once that you made a filter to catch Chinese characters 
> in the subject, we have a few customers that get nailed by these often.  Was 
> wondering if you could share your thoughts.   Thanks,
>  
> Keith  
> 
>       -----Original Message----- 
>       From: [EMAIL PROTECTED] on behalf of Scott Fisher 
>       Sent: Tue 8/24/2004 12:12 AM 
>       To: [EMAIL PROTECTED] 
>       Cc: 
>       Subject: [Declude.JunkMail] External Test for Subject is Upper Case
>       
>        
> 
>       I've made an external test to test if the Subject is all upper case (or 
> punctuation).
>       If anyone is interested, let me know and I'll e-mail you a copy.
>       ---
>       [This E-mail was scanned for viruses by Declude Virus 
> (http://www.declude.com)] 
>       
>       ---
>       This E-mail came from the Declude.JunkMail mailing list.  To
>       unsubscribe, just send an E-mail to [EMAIL PROTECTED], and
>       type "unsubscribe Declude.JunkMail".  The archives can be found
>       at http://www.mail-archive.com.
>        
> 
 

---
[This E-mail was scanned for viruses by Declude Virus (http://www.declude.com)] 

---
This E-mail came from the Declude.JunkMail mailing list.  To
unsubscribe, just send an E-mail to [EMAIL PROTECTED], and
type "unsubscribe Declude.JunkMail".  The archives can be found
at http://www.mail-archive.com.

---
[This E-mail was scanned for viruses by Declude Virus (http://www.declude.com)]

---
This E-mail came from the Declude.JunkMail mailing list.  To
unsubscribe, just send an E-mail to [EMAIL PROTECTED], and
type "unsubscribe Declude.JunkMail".  The archives can be found
at http://www.mail-archive.com.

Reply via email to