http://issues.apache.org/SpamAssassin/show_bug.cgi?id=4794

           Summary: is_charset_ok_for_locales() may be too generic
           Product: Spamassassin
           Version: 3.1.0
          Platform: All
        OS/Version: Linux
            Status: NEW
          Severity: enhancement
          Priority: P5
         Component: Rules (Eval Tests)
        AssignedTo: [email protected]
        ReportedBy: [EMAIL PROTECTED]


I've configured:

ok_locales en fr

(or even just "en") and I notice that messages written in Turkish, Cyrillic,
Greek, etc. all get through just fine even though my locales are English or
English and French.  Apparently the sieve for language tests is too granular.

I'm thinking that in "en", the rule that should apply is the following:

* the USASCII charset is fine;

* all 7-bit characters are fine;

* the 8-bit characters in ISO8859-1 should be fine (if we want to be extra 
liberal);

* the non-accented characters in ISO8859-[2-4] should be fine (section,
non-breaking space, etc);

And either anything else should fail the test, or else a small percentage (like
less than 0.5%) of accented characters from these "border line" character sets
should pass but anything more fail (since someone might send a message in
English, but write their name or signature in Greek or Russian or whatever).



------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

Reply via email to