I have no idea how a linguistic tagger determines whitespace and whether it uses the same definition for whitespace as NSCharacterSet does. Given that it's multi-language-aware I wouldn't be shocked to find it uses some entirely different way of enumerating textual elements.
> On 6 Apr 2015, at 20:29, Gerriet M. Denkmann <gerri...@icloud.com> wrote: > > >> On 4 Apr 2015, at 16:13, cocoa-dev-requ...@lists.apple.com wrote: >> >> ok here’s my try, assuming NSLinguisticTagger knows what it’s doing. And yes >> it’s a bit stupid to use a linguistic tagger to do something like that but >> .. whatever > > Linguistic Tagger should use the same definition for "white" as > NSCharacterSet.whitespaceCharacterSet. > If this is so, this would work for all characters (even if their Unicode code > point does NOT fit into an unsigned short): > > import Cocoa > > let whiteSet = NSCharacterSet.whitespaceCharacterSet() > let testString = " ..." > > var i : Int = 0 > for scalar in testString.unicodeScalars > { > let uChar : UTF32Char = scalar.value > let isWhite = whiteSet.longCharacterIsMember(uChar) > let note = isWhite ? " whiteSpace " : " non white " > > var stringWithScalar = " " > stringWithScalar.append(scalar) > > let indexFormated = NSString(format: "%2d", i++) > > let codePoint = scalar.value // UInt32 > let hexFormated = NSString(format: "%#07x", codePoint) > > println( "codePoint[" + indexFormated + "] = " + hexFormated + note + > stringWithScalar) > } > > Gerriet. > > > _______________________________________________ Cocoa-dev mailing list (Cocoa-dev@lists.apple.com) Please do not post admin requests or moderator comments to the list. Contact the moderators at cocoa-dev-admins(at)lists.apple.com Help/Unsubscribe/Update your Subscription: https://lists.apple.com/mailman/options/cocoa-dev/archive%40mail-archive.com This email sent to arch...@mail-archive.com