On Fri, Sep 21, 2001 at 05:36:15PM -0700, Paul Prescod wrote:
> I think we agree that all encoding issues are in the codecs and IO
> disciplines, right? So what's the big problem with having codecs that
> interpret surrogates? Codecs don't usually mind variable-width encodings
> -- that's what they are designed for.
> 
> Internally you can use UCS-4 for strings that have 4 byte chars, UCS-2
> for strings with only BMP chars and Latin-1 for strings with only 1-byte
> chars.

Wow, this conversation's lurching all over the place. I thought we were
talking about why it's unpleasant to encode Japanese outside the BMP, the
answer to which is "because we don't want to *have* to make codecs do all this
horrible stuff if we can avoid it". But I agree violently with what you've
said above.

> Remember that we're separating file formats from internal character set.
> So the interesting question is what character set(s) an application
> use(s) internally.

This is not easy to tell most of the time. :)

> But a more intresting question would be about a word
> processor that was written more recently. The first version of Ichitaro
> was in 1985!

But you were asking about the software that people use, not the software
that was written recently!

> According to Sun, the Ichitaro people helped define Java's
> internationalization and have written an all-Java (i.e. Unicode) version
> of Ichitaro known as Ichitaro Ark.

Yeah, I think I was one of the beta testers for that. :)

So it uses Unicode internally. Which I think we're all agreed is a good thing.
I don't see how this relates to implementing intepreters, though.

-- 
On our campus the UNIX system has proved to be not only an effective software
tool, but an agent of technical and social change within the University.
- John Lions (U. of NSW)

Reply via email to