On Aug 18, 2008, at 10:57 PM, Michael Ash wrote:

Note that depending on what kind of results you want, even if all of
your data is within the BMP, this *still* won't save you.

As a really basic example, consider a simple, obvious character like
é. (That's an e with an acute accent on it if you're having unicode
trouble in your e-mail client.) That can be represented as two
separate unicode code points, a plain old ASCII e followed by a
combining accent mark. If you should happen to split the string on the
accent mark, such that the e goes into the first half and the
combining accent mark goes into the second half, you get a really
unintuitive result. What appears to the user to be a single character
gets suddenly blown in two. Worse, if you happen to insert a string in
the middle, you could end up applying that acute accent to some
*other* letter instead.

Sorry, failed to mention that our UTF-16BE data was also normalized to pre-composed Unicode. So this case was handled.

You mentioned Korean (which I have yet to play around with), but for another grand 'ol time, try Arabic. You get into something called "positional variants". But alas, that's outside the scope of this list.

I think the moral of the story here is that when working with Unicode data, it's best to normalize such data and then ensure APIs operating on the data are Unicode savvy.

Thankfully, as you've pointed out, the NSString etc. APIs shield folks from much of the gory details.

___________________________________________________________
Ricky A. Sharp         mailto:[EMAIL PROTECTED]
Instant Interactive(tm)   http://www.instantinteractive.com



_______________________________________________

Cocoa-dev mailing list (Cocoa-dev@lists.apple.com)

Please do not post admin requests or moderator comments to the list.
Contact the moderators at cocoa-dev-admins(at)lists.apple.com

Help/Unsubscribe/Update your Subscription:
http://lists.apple.com/mailman/options/cocoa-dev/archive%40mail-archive.com

This email sent to [EMAIL PROTECTED]

Reply via email to