On March 10, 2004 at 17:02, Ulrich Mayring wrote: > In readmail.pl we had to fix a bug (it may be already fixed in cvs, as > it was reported earlier by my colleague). The line > > &$encfunc(\$strtxt, $charset, $TextEncode); > > has to be replaced with > > &$encfunc(\$strtxt, $real_charset, $TextEncode);
I do not recall getting a bug report about this. I commited a change into CVS (btw, patch diffs help me find the exact line much quicker). > Also, in MAILdecode_1522_str we delete the 8th bit of all headers. This > is compliant with the MIME standard. This line was added after getting > the text encoder: > > $str =~ tr [\200-\377] [\000-\177]; Actually, this can be done by registering a "plain" character set converter via CHARSETCONVERTERS. The plain convert is called on text that is not non-ASCII encoded. Stripping of 8-bit characters are not done by default since there is enough broken usage of this in various locales that it would cause more problems than solving them. If strict enforcement is required, then CHARSETCONVERTERS can be used, $mhonarc::CBRawMessageBodyRead can be used to pre-process the header data, or a pre-processor, like procmail, can be used. (see appendix of documentation about callback functions). > Also, in mhtxtplain.pl after ## Fixup any EOL mess: > > ## Fix invalid XML characters > $$data =~ s/\x0c//g; > > This does not catch all invalid XML characters, but the one that killed > us in our mails. You may want to look at the $mhonarc::CBMessageBodyRead callback. Or, you may want to "chain" the filter call so any changes to mhtxtplain.pl will automatically be inherited by you. See <http://www.mhonarc.org/archive/cgi-bin/mesg.cgi?a=mhonarc-users&i=200309161616.h8GGGc008461%40gator.earlhood.com> --ewh