Matt Sergeant wrote:
Wasn't there unicode normalisation in the original email parser that I
submitted to the project (that Theo turned into the current parser) ?
Certainly it would make sense to use that if you could. It works very
well on a very large set of test data.
That code only deals with MIME-labeled charsets. It has no provision
for charset detection.
The code puts charset normalization inside of
Mail::SpamAssassin::Message::Node::decode(). I don't think charset
normalization is appropriate for the decode call that is used in parsing
message/rfc822 objects.