After some thought, I'm inclined to change the default value for iso-2022-jp to MHonArc::CharEnt::str2sgml.
Reason: By default, MHonArc should be as locale neutral as possible. The iso2022jp.pl filter is specific to a particular locale. Because of this reasoning, I will also change the default value for iso-8859-1 to MHonArc::CharEnt::str2sgml. The use of mhonarc::htmlize assumes a Latin 1-based locale since only HTML specials are converted. Now, the iso2022jp.pl will still be available. I will add a note under the "Compatibility Notes" section of the release notes about the change. The wording will be as follows: UPGRADING FROM v2.5.x OR EARLIER: Default iso-2022-jp Converter Changed In v2.6, the default charset converter for iso-2022-jp has changed to the following: <CharsetConverters> iso-2022-jp; MHonArc::CharEnt::str2sgml; MHonArc/CharEnt.pm </CharsetConverters> This filter converts all Japanese characters into Unicode character entity references (e.g. 特) removing the iso-2022-jp encoding. For some Japanese locales, this type of conversion may not be desired since some Japanese-aware processing tools may not support Unicode character entity references. If you want to preserve the iso-2022-jp encoding, you must explicitly specify the use of iso_2022_jp::str2html via the CHARSETCONVERTERS resource as follows: <CharsetConverters> iso-2022-jp; iso_2022_jp::str2html; iso2022jp.pl </CharsetConverters> The change to MHonArc::CharEnt::str2sgml as the default converter for iso-2022-jp was done to make MHonArc as locale neutral as possible in its default configuration. For more information about using MHonArc in a Japanese locale, see (documents in Japanese): <http://www.shiratori.riec.tohoku.ac.jp/~p-katoh/Hack/Docs/mhonarc-jp/ index.html> <http://www.shiratori.riec.tohoku.ac.jp/~p-katoh/Hack/Docs/mhonarc-jp/ mhonarc-jp-2_4.html> I figure there will be some objections to the change, but the main principle of locale neutrality is important IMO. Remember, this is only the default setting. Other locales that desire to avoid Unicode character entity references will have to change CHARSETCONVERTERS also. For 8-bit sets, mhonarc::htmlize can be used. BTW, I plan to document the various charset converter functions available in the CHARSETCONVERTERS resource page in a similiar manner that MIMEFILTERS documents the various filters that are available. Feedback is welcome. v2.6 is still some time away, so there is time to provide counter arguments to my decision. --ewh --------------------------------------------------------------------- To sign-off this list, send email to [EMAIL PROTECTED] with the message text UNSUBSCRIBE MHONARC-DEV
