On December 26, 2004 at 13:04, Jeff Breidenbach wrote:

> Unfortunately, while MHonArc is fine at dealing with UTF-8 messages,
> it will choke on a UTF-8 configuration file. So the Serbian

Yes and no, depending on where multi-byte encoding occurs.  But in
general, it is wise to avoid multi-byte sequences when possible.  I
believe a warning about this is somewhere in the mhonarc docs.

> localization will need to be converted to ISO 10646 numerical
> character references. I don't personally know how to do a
> UTF-8 -> ISO 10646 conversion, but hopefully you can figure it
> out or maybe someone on gossip knows how to do it.

All that is needed is the Unicode code point value of the character
and use that value as the numeric character reference.  You can write
a Perl script using unpack to map the UTF-8 sequences into character
entity references.  Take a look at MHonArc's MHonArc::CharEnt module
for one implementation that does this (note, use the version in the
latest snapshot build since it contains a fix to the invocation of
unpack for perl versions >= 5.6).

--ewh

_______________________________________________
Discussion list for The Mail Archive
Gossip@jab.org
http://jab.org/cgi-bin/mailman/listinfo/gossip

Reply via email to