> On 24 Sep 2014, at 1:02 pm, Gerriet M. Denkmann <gerr...@mdenkmann.de> wrote: > > > On 24 Sep 2014, at 11:46, Roland King <r...@rols.org> wrote: > >> >>> On 24 Sep 2014, at 12:31 pm, Gerriet M. Denkmann <gerr...@mdenkmann.de> >>> wrote: >>> >>> I have a problem with NSLinguisticTagger / CFStringTokenizer on iOS 8.0 >>> >>> OS X 10.9.5 (and iOS 7 and earlier) parses "สีเหลือง" quite rightly as two >>> words: "สี" = colour and "เหลือง" = yellow. >>> >>> No dictionary will ever contain "yellow colour". Every dictionary will >>> contain "yellow" and "colour". >>> There are hundreds, if not thousands of these expressions, which are >>> wrongly classified as one word. >>> Might have something to do with the new predictive keyboard. >>> >>> But I am not writing this to complain, but to ask for a favour: could >>> anybody on 10.10 just click anywhere in: "สีเหลือง" and tell me whether all >>> gets highlighted, or just a part (as in 10.9.5)? >> >> >> If I double click anywhere on the right of that I get the second part (all >> bar the first character) highlighted. Clicking on the first character I get >> just that character. So 10.10 (beta 8) splits that sequence into two >> ‘words’. > This is a big relief. Thanks a lot. > >> >> Why do you suspect the predictive keyboard? Certainly wouldn’t be the first >> thing I thought of seeing that issue. I would probably instead assume I’d >> written myself a bug. > > Well, here is the code; maybe you can find a bug: > > let text = "สีเหลือง" > let opts: Int = 0 > let schemes = [ NSLinguisticTagSchemeTokenType, > NSLinguisticTagSchemeNameTypeOrLexicalClass ] > let tagger = NSLinguisticTagger(tagSchemes: schemes, options: opts ) > > let nsText = text as NSString > let length = nsText.length > tagger.string = nsText > let range = NSMakeRange(0,length) > let theScheme = NSLinguisticTagSchemeTokenType > let ops = NSLinguisticTaggerOptions(0) > tagger.enumerateTagsInRange ( > range, > scheme: theScheme, > options: ops, > usingBlock: > { ( tag: String!, > tokenRange: NSRange, > sentenceRange: NSRange, > stop: UnsafeMutablePointer<ObjCBool> > ) -> Void in > > let word = nsText.substringWithRange(tokenRange) > println("\(tag) = \(word) " ) > } > ) > > Gerriet. >
Here’s my version I was just writing - I ran it in an iOS playground AND in an OSX playground and get the same ‘single word’ result either time. So I’m not entirely sure that the click test on OSX proved anything. If you comment out the Thai string and uncomment Chinese one, it works better and splits stuff up although the last two words are wrong there as well, they should be ‘去“ and “健身房“. It’s the same in an OSX playground and an iOS one but then again iOS playgrounds are emulated so .. I also compiled it as an OSX command line tool and it does the same thing for my phrase AND yours. So whatever is doing the highlighting when you ‘click’ isn’t the same thing NSLinguisticTagger is doing. The click test works on my chinese phrase too, it gets the last two words correct. Something sure ain’t right. Should write the objc version to eliminate any possibility it’s swift. let str = "สีเหลือง" //let str = "我今天还没有去健身房" let str2 = str as NSString let tagger = NSLinguisticTagger(tagSchemes: [NSLinguisticTagSchemeTokenType], options: 0 ) let range = NSMakeRange( 0, str2.length ) tagger.string = str2 var ranges : NSArray? let things = tagger.tagsInRange( range, scheme: NSLinguisticTagSchemeTokenType, options: NSLinguisticTaggerOptions.allZeros, tokenRanges: &ranges ) things.count ranges for ( index, type ) in enumerate( things ) { let type_range : NSValue? = ranges?[ index ] as NSValue? print( "Type: '\(type)' at \(type_range!) ") println( str2.substringWithRange(type_range! ) ) } _______________________________________________ Cocoa-dev mailing list (Cocoa-dev@lists.apple.com) Please do not post admin requests or moderator comments to the list. Contact the moderators at cocoa-dev-admins(at)lists.apple.com Help/Unsubscribe/Update your Subscription: https://lists.apple.com/mailman/options/cocoa-dev/archive%40mail-archive.com This email sent to arch...@mail-archive.com