[Courtesy cc to MHonArc mailing list]
On March 26, 1999 at 15:58, "Alexander Voropay" wrote:
> P.S. Could you include russian charset "koi8-r" into supported charset
> list in CHARSETCONVERTERS by default ? You can read more about
> "koi8-r" at http://nagual.pp.ru/~ache/koi8.html .
Someone may need to provide a charset converter for MHonArc to have it
included. For example, iso-2022-jp support was contributed.
Note, the converter may be trivial depending on the characteristics of
koi8-r, but I know nothing about it. Plus, it will probably be hard
for me to do something myself since I could not verify what I am doing
is right (If I knew Russian, it may not be a problem). However, I am
willing to help out with anyone who is familiar with koi8-r to get
a converter written for MHonArc.
Looking at the site you gave, it appears koi8-r is 8-bit, and the 7-bit
characters coincide with US-ASCII. Maybe the mhonarc::htmlize routine
will suffice as a base converter.
BTW, a potential problem with charsets in general is that HTML is not
good about supporting mixes charsets withing a document. For example,
wrt to MIME, I can have multiple charset specifications in a single
message. However, it appears that HTML only supports a global charset
specification for the entire document. The CHARSET attribute (as
defined in HTML 4.0) is only used in elements that refer to external
entities and not on a per element basis. For example, the following is
not possible:
<p charset="koi8-r">Some Russian text here ...</p>
<p charset="iso-8859-2">Latin 2 text here ...</p>
<p charset="iso-2022-jp">Latin 2 text here ...</p>
I guess one will get into the problems dealing with encoding issues.
Ie. Charsets specifies how a given character is represented, but does
not deal with encoding. I guess if documents adhere to an 8-bit
encoding scheme through out the entire document, conflict may not be
a problem.
In summary, a problem arises when one has something like the following:
=?ISO-2022-JP?B?...?=
=?ISO-8859-3?B?...?=
=?KOI8-R?B?...?=
In a single message header.
Unicode is potential solution, but I am unsure of the WWW client support
for unicode (and my technical knowledge of unicode is limited).
Comments? Especially from Japanese-based users of MHonArc?
Are any users setting the <META http-equiv="Content-Type"
content="text/html; charset=XXXX"> in their MHonArc generated pages?
Or specifying a particular charset through the HTTP server?
--ewh
----
Earl Hood | University of California: Irvine
[EMAIL PROTECTED] | Electronic Loiterer
http://www.oac.uci.edu/indiv/ehood/ | Dabbler of SGML/WWW/Perl/MIME