How can I map a byte offset in a UTF-8 string back to the corresponding 
character offset in the NSString it came from?

I’m writing an Objective-C wrapper around a C text-tokenizer API that takes a 
UTF-8 string as input, and as part of its output returns byte ranges of words 
that it found. So my API takes an NSString, converts it to UTF-8, passes that 
to the C API, and then gets these byte offsets that it needs to convert into 
character offsets in the NSString.

I’ve looked through both the NSString and CFString APIs and didn’t see anything 
relevant to this. I know UTF-8 isn’t rocket science and I could pretty easily 
write my own function to scan through it counting characters, but I suspect I’d 
run into the differences between Unicode characters and the UTF-16 code points 
that NSString actually considers “characters”. I’d much rather let CF do this 
for me in an internally-consistent way.

—Jens
_______________________________________________

Cocoa-dev mailing list (Cocoa-dev@lists.apple.com)

Please do not post admin requests or moderator comments to the list.
Contact the moderators at cocoa-dev-admins(at)lists.apple.com

Help/Unsubscribe/Update your Subscription:
https://lists.apple.com/mailman/options/cocoa-dev/archive%40mail-archive.com

This email sent to arch...@mail-archive.com

Reply via email to