https://bz.apache.org/SpamAssassin/show_bug.cgi?id=7815

RW <[email protected]> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |[email protected]

--- Comment #8 from RW <[email protected]> ---
(In reply to Henrik Krohns from comment #7)
> Well normalize_charset 1 seems to fix the HTML/charset parsing, so try that.
> It's something that in future (or even now) is supposed to be used anyway.

>From a quick look in the 'languages' file distributed with the rules. It looks
like UTF-8 is only supported for Amharic and Yiddish. I guess TextCat continues
to work with modern UTF-8 mail because there are enough pure ASCII ngrams for
many languages that use the Roman alphabet.

normalize_charset 1 should break the detection of Arabic altogether.

-- 
You are receiving this mail because:
You are the assignee for the bug.

Reply via email to