Geoff> In other words, you're suggesting that the internal
Geoff> representation be done in UTF-8 and everything else be
Geoff> transformed to that? Interesting.
Well, that's what people seem to want to do: Tcl, Guile, and Gtk+ are
all doing this. (Of these, Tcl doesn't use libunicode.)
Geoff> I haven't looked at any code, but I got the idea that Java
Geoff> stores everything in Unicode (UCS-2?) internally. Is there any
Geoff> benefit to one approach over the other?
Java uses UCS-2. IMNSHO, UCS-2 is a terrible encoding. It pretends
to be fixed-width, but really isn't, due to the existence of surrogate
characters. As far as I can tell (and I've helped reimplement most of
the Java class libraries as part of the gcj project), Java ignores
surrogate characters -- oops.
UCS-4 is a better choice if you want a fixed-width encoding. Guile
uses UCS-4 to represent single characters (a scheme character is
UCS-4, but a scheme string is UTF-8).
It might make sense to add wide (UCS-4) string functions to
libunicode, but I don't know how far down this road I want to go. My
goals are really pretty modest, but they are also fairly fuzzy and
undefined.
Tom
------------------------------------
To unsubscribe from the htdig3-dev mailing list, send a message to
[EMAIL PROTECTED] containing the single word "unsubscribe" in
the SUBJECT of the message.