Simon Cozens wrote:
> Simon Cozens wrote:
> > > Urgh, this is tricky. Once you move outside of the BMP, the encodings you
> > > *really* want to work stop working.
>
> UCS-2 is only defined for characters inside the Basic Multilingual Plane;
> UTF-16 has to use surrogates for non-BMP characters, and that sucks too
> because what used to be a nice fixed-width encoding has suddenly gone
> variable-width on you. You didn't want that to happen. UTF-8 uses surrogates
> two, which is screaming difficult to process.
I think we agree that all encoding issues are in the codecs and IO
disciplines, right? So what's the big problem with having codecs that
interpret surrogates? Codecs don't usually mind variable-width encodings
-- that's what they are designed for.
Internally you can use UCS-4 for strings that have 4 byte chars, UCS-2
for strings with only BMP chars and Latin-1 for strings with only 1-byte
chars.
>...
> Oh, I forget there are non-Unix platforms. :) I dunno what things like
> Ichitaro use for a file format.
Remember that we're separating file formats from internal character set.
So the interesting question is what character set(s) an application
use(s) internally. But a more intresting question would be about a word
processor that was written more recently. The first version of Ichitaro
was in 1985!
According to Sun, the Ichitaro people helped define Java's
internationalization and have written an all-Java (i.e. Unicode) version
of Ichitaro known as Ichitaro Ark.
"The facility takes full advantage of the internationalization features
of the Java 2 platform. It supports any characters in the unicode
character set -- including Japanese, Chinese, and Korean -- and allows
users to switch menu and dialog boxes to their language of choice, while
inputting different Asian language characters "on-the-fly" via the
selected input method. "
http://java.sun.com/features/1999/07/risingsun.html
"In December, 1999, Ichitaro Ark, a word processing program using 100%
Pure Java, became available from Justsystem Corporation, a major
software company in Japan, at the list price of 8,900 yen plus tax."
http://www.justsystem.co.jp/ark/review/04e.html
Paul Prescod