I have no idea how a linguistic tagger determines whitespace and whether it 
uses the same definition for whitespace as NSCharacterSet does. Given that it's 
multi-language-aware I wouldn't be shocked to find it uses some entirely 
different way of enumerating textual elements. 

> On 6 Apr 2015, at 20:29, Gerriet M. Denkmann <gerri...@icloud.com> wrote:
> 
> 
>> On 4 Apr 2015, at 16:13, cocoa-dev-requ...@lists.apple.com wrote:
>> 
>> ok here’s my try, assuming NSLinguisticTagger knows what it’s doing. And yes 
>> it’s a bit stupid to use a linguistic tagger to do something like that but 
>> .. whatever 
> 
> Linguistic Tagger should use the same definition for "white" as 
> NSCharacterSet.whitespaceCharacterSet.
> If this is so, this would work for all characters (even if their Unicode code 
> point does NOT fit into an unsigned short):
> 
> import Cocoa
> 
> let whiteSet = NSCharacterSet.whitespaceCharacterSet()
> let testString = " ..."
> 
> var i : Int = 0
> for scalar in testString.unicodeScalars
> {
>       let uChar : UTF32Char = scalar.value
>       let isWhite = whiteSet.longCharacterIsMember(uChar)
>       let note = isWhite ? " whiteSpace " : " non white  "
>       
>       var stringWithScalar = "  "
>       stringWithScalar.append(scalar)
> 
>       let indexFormated = NSString(format: "%2d", i++)
> 
>       let codePoint = scalar.value    //      UInt32
>       let hexFormated = NSString(format: "%#07x", codePoint)
>       
>       println( "codePoint[" + indexFormated + "] = " + hexFormated + note + 
> stringWithScalar)
> }
> 
> Gerriet.
> 
> 
> 


_______________________________________________

Cocoa-dev mailing list (Cocoa-dev@lists.apple.com)

Please do not post admin requests or moderator comments to the list.
Contact the moderators at cocoa-dev-admins(at)lists.apple.com

Help/Unsubscribe/Update your Subscription:
https://lists.apple.com/mailman/options/cocoa-dev/archive%40mail-archive.com

This email sent to arch...@mail-archive.com

Reply via email to