https://issues.apache.org/SpamAssassin/show_bug.cgi?id=6229
--- Comment #21 from Mark Martinec <[email protected]> 2011-05-09 15:41:58 UTC --- > Btw we have a (somewhat forgotten) normalize_charset feature. :-) It converts > rendered() body to latin1, using Encode::Detect and utf8::downgrade. The normalize_charset suffers from two problems: - it tries to *guess* a character set from a text sample, instead of taking the encoding information for a MIME subheader; - the Bug 5691 (Slow rules due to charset normalization) is still applicable. The attached test case there still takes 19 times as much time as a non-UTF8 case using perl 5.12 (it used to be 30 time slower with older perl). > I think we could discuss about it in some related or new bug. Maybe even 3.4 > could have it on by default. I used to have normalize_charset enabled, but after being bitten by extreme slowdowns on some mail messages, we can no longer afford to use this feature on a production mailer. Too bad. Not something that could be enabled by default in 3.4 if you ask me. > In any case we probably need to keep the "lc-code" forever, since it could be > hard to create textcat database with all case variations.. but we need to make > sure we know the locale for body and handle accordingly. True. Let's just keep things simple for 3.3.2 and apply this simple patch, then we can open a new problem report to discuss introduction of more fancy (but also more risky) stuff like proper handling of encodings of each message mime part. -- Configure bugmail: https://issues.apache.org/SpamAssassin/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug.
