http://issues.apache.org/SpamAssassin/show_bug.cgi?id=4636
------- Additional Comments From [EMAIL PROTECTED] 2006-01-10 03:07 ------- I've commented in the past that I'm opposed to the idea of character set normalization and that this functionality would need to be isolated with options and/or a plugin interface. My reasoning is as follows: - there is a performance penalty associated with character set recoding - spam patterns are generally encoded in a limited number of character sets - therefore, catch rates do not increase with recoding (if anything, they are quite likely to decrease due to spam tricks causing us to pick the wrong character set) - however, ham catch rates will INCREASE since the amount of ham matching the pattern is likely to increase (matches being accidental) - so, S/O is likely to go down for multi-character set rules (more often than not) and performance will go down as well For these reasons, I am -1 (that is, vetoing) the current form of this code that has the performance loss and requires recoding. I would also be -1 on requiring any non-utf8 rules to be utf8. Basically, SpamAssassin does need better understanding of character set and ability to support more character sets better, for rules, descriptions, rendering, and tokenization, but I see no benefit to recoding messages, especially since anti-spam patterns are written against a small subset of possible encodings. ------- You are receiving this mail because: ------- You are on the CC list for the bug, or are watching someone who is.
