Justin Mason wrote:
+1 on the idea of charset normalization to a UTF-8 form, at least for the
headers and one rendering of the message body. I think that should be the
"body" rendering. I also think that the "rawbody" and "full" renderings
should remain un-normalized, as should the "Header:raw" pseudo-header
selector.
I agree with "full" and ":raw"--charset normalization makes no sense if
you aren't removing transfer encodings. I was thinking it was
appropriate for rawbody since that more correctly represents the HTML form.
One could argue either way for the correct "rawbody" semantic.
Whichever one we choose, I can see SpamAssassin adding a new rule class
for the other semantic.
+1 on fixing Mail::SpamAssassin::HTML as described.
Which description? There are three choices for when to skip the pack calls:
1) When charset normalization is enabled
2) When the installed HTML::Parser is 3.43 or later
3) Always (upping the minimum requirement for HTML::Parser to 3.43)