On Tue, 2007-10-02 at 14:32 -0700, Stefan O'Rear wrote: > UTF-8 supports CJK languages too. The only question is efficiency, and > I believe CJK is still a relatively uncommon case compared to English > and other Latin-alphabet languages. (That said, I live in a country all > of whose dominant languages use the Latin alphabet)
As for space efficiency, I guess the argument could be made that since an ideogram typically conveys a whole word, it is reasonably to spend more bits for it. Anyway, I am unsure if I should take part in this discussion, as I'm not really dealing with text as such in multiple languages. Most of my data is in ASCII, and when they are not, I'm happy to treat it ("treat" here meaning "mostly ignore") as Latin1 bytes (current ByteString) or UTF-8. The only thing I miss is the ability to use String syntactic sugar -- but IIUC, that's coming? However, increased space usage is not acceptable, and I also don't want any conversion layer which could conceivably modify my data (e.g. by normalizing or error handling). -k _______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe