Re: [Haskell-cafe] Re: PROPOSAL: New efficient Unicode string library.

Deborah Goldsmith Tue, 02 Oct 2007 08:02:45 -0700

On Oct 2, 2007, at 5:11 AM, ChrisK wrote:

Deborah Goldsmith wrote:
UTF-16 is the native encoding used for Cocoa, Java, ICU, andCarbon, and
is what appears in the APIs for all of them. UTF-16 is also what's
stored in the volume catalog on Mac disks. UTF-8 is only used in BSD
APIs for backward compatibility. It's also used in plain textfiles (or
XML or HTML), again for compatibility.

Deborah
On OS X, Cocoa and Carbon use Core Foundation, whose API does nothave a
one-true-encoding internally.  Follow the rather long URL for details:
http://developer.apple.com/documentation/CoreFoundation/Conceptual/CFStrings/index.html?http://developer.apple.com/documentation/CoreFoundation/Conceptual/CFStrings/Articles/StringStorage.html#//apple_ref/doc/uid/20001179
I would vote for an API that not just hides the internal store, butallows
different internal stores to be used in a mostly compatible way.
However, There is a UniChar typedef on OS X which is the sameunsigned 16 bit
integer as Java's JNI would use.

UTF-16 is the type used in all the APIs. Everything else isconsidered an encoding conversion.

CoreFoundation uses UTF-16 internally except when the string fitsentirely in a single-byte legacy encoding like MacRoman orMacCyrillic. If any kind of Unicode processing needs to be done tothe string, it is first coerced to UTF-16. If it weren't forbackwards compatibility issues, I think we'd use UTF-16 all the timeas the machinery for switching encodings adds complexity. I wouldn'tadvise it for a new library.


Deborah

_______________________________________________
Haskell-Cafe mailing list
[email protected]
http://www.haskell.org/mailman/listinfo/haskell-cafe

Re: [Haskell-cafe] Re: PROPOSAL: New efficient Unicode string library.

Reply via email to