On Oct 2, 2007, at 5:11 AM, ChrisK wrote:
Deborah Goldsmith wrote:

UTF-16 is the native encoding used for Cocoa, Java, ICU, and Carbon, and
is what appears in the APIs for all of them. UTF-16 is also what's
stored in the volume catalog on Mac disks. UTF-8 is only used in BSD
APIs for backward compatibility. It's also used in plain text files (or
XML or HTML), again for compatibility.

Deborah


On OS X, Cocoa and Carbon use Core Foundation, whose API does not have a
one-true-encoding internally.  Follow the rather long URL for details:

http://developer.apple.com/documentation/CoreFoundation/Conceptual/ CFStrings/index.html?http://developer.apple.com/documentation/ CoreFoundation/Conceptual/CFStrings/Articles/StringStorage.html#// apple_ref/doc/uid/20001179

I would vote for an API that not just hides the internal store, but allows
different internal stores to be used in a mostly compatible way.

However, There is a UniChar typedef on OS X which is the same unsigned 16 bit
integer as Java's JNI would use.

UTF-16 is the type used in all the APIs. Everything else is considered an encoding conversion.

CoreFoundation uses UTF-16 internally except when the string fits entirely in a single-byte legacy encoding like MacRoman or MacCyrillic. If any kind of Unicode processing needs to be done to the string, it is first coerced to UTF-16. If it weren't for backwards compatibility issues, I think we'd use UTF-16 all the time as the machinery for switching encodings adds complexity. I wouldn't advise it for a new library.

Deborah

_______________________________________________
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe

Reply via email to