https://bz.apache.org/SpamAssassin/show_bug.cgi?id=7815
RW <[email protected]> changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |[email protected] --- Comment #8 from RW <[email protected]> --- (In reply to Henrik Krohns from comment #7) > Well normalize_charset 1 seems to fix the HTML/charset parsing, so try that. > It's something that in future (or even now) is supposed to be used anyway. >From a quick look in the 'languages' file distributed with the rules. It looks like UTF-8 is only supported for Amharic and Yiddish. I guess TextCat continues to work with modern UTF-8 mail because there are enough pure ASCII ngrams for many languages that use the Roman alphabet. normalize_charset 1 should break the detection of Arabic altogether. -- You are receiving this mail because: You are the assignee for the bug.
