On Oct 2, 2007, at 8:44 AM, Jonathan Cast wrote:
I would like to, again, strongly argue against sacrificing compatibility with Linux/BSD/etc. for the sake of compatibility with OS X or Windows. FFI bindings have to convert data formats in any case; Haskell shouldn't gratuitously break Linux support (or make life harder on Linux) just to
support proprietary operating systems better.

Now, if /independent of the details of MacOS X/, UTF-16 is better
(objectively), it can be converted to anything by the FFI. But doing it the way Java or MacOS X or Win32 or anyone else does it, at the expense
of Linux, I am strongly opposed to.

No one is advocating that. Any Unicode support library needs to support exporting text as UTF-8 since it's so widely used. It's used on Mac OS X, too, in exactly the same contexts it would be used on Linux. However, UTF-8 is a poor choice for internal representation.

On Oct 2, 2007, at 2:32 PM, Stefan O'Rear wrote:
UTF-8 supports CJK languages too. The only question is efficiency, and
I believe CJK is still a relatively uncommon case compared to English
and other Latin-alphabet languages. (That said, I live in a country all
of whose dominant languages use the Latin alphabet)

First of all, non-Latin countries already represent a large fraction of computer usage and the computer market. It is not at all "relatively uncommon." Japan alone is a huge market. China is a huge market.

Second, it's not just CJK, but anything that's not mostly ASCII. Russian, Greek, Thai, Arabic, Hebrew, etc. etc. etc. UTF-8 is intended for compatibility with existing software that expects multibyte encodings. It doesn't work well as an internal representation. Again, no one is saying a Unicode library shouldn't have full support for input and output of UTF-8 (and other encodings).

If you want to process ASCII text and squeeze out every last ounce of performance, use byte strings. Unicode strings should be optimized for representing and processing human language text, a large share of which is not in the Latin alphabet.

Remember, speakers of English and other Latin-alphabet languages are a minority in the world, though not in the computer-using world. Yet.

Deborah

_______________________________________________
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe

Reply via email to