http://issues.apache.org/SpamAssassin/show_bug.cgi?id=4794
Summary: is_charset_ok_for_locales() may be too generic
Product: Spamassassin
Version: 3.1.0
Platform: All
OS/Version: Linux
Status: NEW
Severity: enhancement
Priority: P5
Component: Rules (Eval Tests)
AssignedTo: [email protected]
ReportedBy: [EMAIL PROTECTED]
I've configured:
ok_locales en fr
(or even just "en") and I notice that messages written in Turkish, Cyrillic,
Greek, etc. all get through just fine even though my locales are English or
English and French. Apparently the sieve for language tests is too granular.
I'm thinking that in "en", the rule that should apply is the following:
* the USASCII charset is fine;
* all 7-bit characters are fine;
* the 8-bit characters in ISO8859-1 should be fine (if we want to be extra
liberal);
* the non-accented characters in ISO8859-[2-4] should be fine (section,
non-breaking space, etc);
And either anything else should fail the test, or else a small percentage (like
less than 0.5%) of accented characters from these "border line" character sets
should pass but anything more fail (since someone might send a message in
English, but write their name or signature in Greek or Russian or whatever).
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.