At Sun, 2009-04-26, 09:01, Alastair Houghton <alast...@alastairs-place.net> 
wrote:
>> At 2009 Apr 26, 04:33, Jeffrey Oleander wrote:
>> NSArray * tokens = [string
>> componentsSeparatedByCharactersInSet:
>> whitespaceCharacterSet];

> No, no, no.  If you read Gerriet's original post,
> you would have noticed that he even explained
> that what you just said won't work, because not
> all languages use whitespace to separate words
> like English does.
>
> You probably want to be using CFStringTokenizer(),
> at least on OS X.  For cross-platform code,
> ICU is probably your best bet.

Thanks for that info and the pointer to the ICU.
http://userguide.icu-project.org/boundaryanalysis
BreakIterator
Character, Word, Line or Sentence
"you provide an appropriate CharacterIterator"
UChar *

I was half expecting that response because I was
aware that "not all languages use white space to
separate words", but hoping for some magic in
NSString.
Unfortunately, CFStringTokenizer is not available 
in 10.3.9, and no, I do not have a chest of silver 
or gold behind my pillow to run around buying 
newer hardware and software, let alone doing so
every 2 years; we're in re-boot-strapping mode
in the land of the globalized Bush-Clinton-Bush-Obama
depression.

This makes ICU suspect: 
"Copyright (c) 2000 - 2008 IBM and Others"
Is Apple one of the "Others"?  The "using ICU" 
list is a mixed bag of reputable firms and 
unethical rogues, and I don't see any 
additional info on who is behind "ICU".

As much as I enjoy languages (I've taken a few
in college, and 10 years ago I was on a couple 
Unicode e-mailing lists mainly to read the 
interesting discussion about the differences), 
for now I'll stick with the US+Euro+Japanese+
Latin+Hebrew solution that I have, that uses 
Objective-C and doesn't drag me into the 
complications of Objective-C++ and transferring 
data around to different stores based on 
different programming language and framework 
conventions, that I can immediately use, 
can trust, and seems amenable to reasonable 
later modification to handle the remote out-liers.

Onward.



_______________________________________________

Cocoa-dev mailing list (Cocoa-dev@lists.apple.com)

Please do not post admin requests or moderator comments to the list.
Contact the moderators at cocoa-dev-admins(at)lists.apple.com

Help/Unsubscribe/Update your Subscription:
http://lists.apple.com/mailman/options/cocoa-dev/archive%40mail-archive.com

This email sent to arch...@mail-archive.com

Reply via email to