Geoff> In other words, you're suggesting that the internal
Geoff> representation be done in UTF-8 and everything else be
Geoff> transformed to that? Interesting.

Well, that's what people seem to want to do: Tcl, Guile, and Gtk+ are
all doing this.  (Of these, Tcl doesn't use libunicode.)

Geoff> I haven't looked at any code, but I got the idea that Java
Geoff> stores everything in Unicode (UCS-2?) internally. Is there any
Geoff> benefit to one approach over the other?

Java uses UCS-2.  IMNSHO, UCS-2 is a terrible encoding.  It pretends
to be fixed-width, but really isn't, due to the existence of surrogate
characters.  As far as I can tell (and I've helped reimplement most of
the Java class libraries as part of the gcj project), Java ignores
surrogate characters -- oops.

UCS-4 is a better choice if you want a fixed-width encoding.  Guile
uses UCS-4 to represent single characters (a scheme character is
UCS-4, but a scheme string is UTF-8).

It might make sense to add wide (UCS-4) string functions to
libunicode, but I don't know how far down this road I want to go.  My
goals are really pretty modest, but they are also fairly fuzzy and
undefined.

Tom

------------------------------------
To unsubscribe from the htdig3-dev mailing list, send a message to
[EMAIL PROTECTED] containing the single word "unsubscribe" in
the SUBJECT of the message.

Reply via email to