On 24 Sep 2014, at 12:23, Roland King <r...@rols.org> wrote: > >> On 24 Sep 2014, at 1:02 pm, Gerriet M. Denkmann <gerr...@mdenkmann.de> wrote: >> >> >> On 24 Sep 2014, at 11:46, Roland King <r...@rols.org> wrote: >> >>> >>>> On 24 Sep 2014, at 12:31 pm, Gerriet M. Denkmann <gerr...@mdenkmann.de> >>>> wrote: >>>> >>>> I have a problem with NSLinguisticTagger / CFStringTokenizer on iOS 8.0 >>>> >>>> OS X 10.9.5 (and iOS 7 and earlier) parses "สีเหลือง" quite rightly as two >>>> words: "สี" = colour and "เหลือง" = yellow. >>>> >>>> No dictionary will ever contain "yellow colour". Every dictionary will >>>> contain "yellow" and "colour". >>>> There are hundreds, if not thousands of these expressions, which are >>>> wrongly classified as one word. >>>> Might have something to do with the new predictive keyboard. >>>> >>>> But I am not writing this to complain, but to ask for a favour: could >>>> anybody on 10.10 just click anywhere in: "สีเหลือง" and tell me whether >>>> all gets highlighted, or just a part (as in 10.9.5)? >>> >>> >>> If I double click anywhere on the right of that I get the second part (all >>> bar the first character) highlighted. Clicking on the first character I get >>> just that character. So 10.10 (beta 8) splits that sequence into two >>> ‘words’. >> This is a big relief. Thanks a lot. >> >>> >>> Why do you suspect the predictive keyboard? Certainly wouldn’t be the first >>> thing I thought of seeing that issue. I would probably instead assume I’d >>> written myself a bug. >> >> Well, here is the code; maybe you can find a bug: >> >> let text = "สีเหลือง" >> let opts: Int = 0 >> let schemes = [ NSLinguisticTagSchemeTokenType, >> NSLinguisticTagSchemeNameTypeOrLexicalClass ] >> let tagger = NSLinguisticTagger(tagSchemes: schemes, options: opts ) >> >> let nsText = text as NSString >> let length = nsText.length >> tagger.string = nsText >> let range = NSMakeRange(0,length) >> let theScheme = NSLinguisticTagSchemeTokenType >> let ops = NSLinguisticTaggerOptions(0) >> tagger.enumerateTagsInRange ( >> range, >> scheme: theScheme, >> options: ops, >> usingBlock: >> { ( tag: String!, >> tokenRange: NSRange, >> sentenceRange: NSRange, >> stop: UnsafeMutablePointer<ObjCBool> >> ) -> Void in >> >> let word = nsText.substringWithRange(tokenRange) >> println("\(tag) = \(word) " ) >> } >> ) >> >> Gerriet. >> > > > > Here’s my version I was just writing - I ran it in an iOS playground AND in > an OSX playground and get the same ‘single word’ result either time. So I’m > not entirely sure that the click test on OSX proved anything. If you comment > out the Thai string and uncomment Chinese one, it works better and splits > stuff up although the last two words are wrong there as well, they should be > ‘去“ and “健身房“. It’s the same in an OSX playground and an iOS one but then > again iOS playgrounds are emulated so .. > > I also compiled it as an OSX command line tool and it does the same thing for > my phrase AND yours. So whatever is doing the highlighting when you ‘click’ > isn’t the same thing NSLinguisticTagger is doing. > > The click test works on my chinese phrase too, it gets the last two words > correct. Something sure ain’t right. > > Should write the objc version to eliminate any possibility it’s swift.
I have an app in ObjC using NSLinguisticTagger, which on 10.9.5 prints: "我" = Word "今天" = Word "还" = Word "没有" = Word "去健" = Word <-- wrong "身房" = Word <-- wrong But when I click on "去" I just get "to go", and when I click on "健身房" I get "gym". So, you are right: the clicking algorithm seems NOT to be using NSLinguisticTagger. And I didn't go to the gym either. Further investigating (again ObjC on 10.9.5): CFStringTokenizer as wrong as NSLinguisticTagger Icu 51.1 correct: token[1] {0, 1} = "我" -- UnKnown Word -- token[2] {1, 2} = "今天" -- UnKnown Word -- token[3] {3, 1} = "还" -- UnKnown Word -- token[4] {4, 2} = "没有" -- UnKnown Word -- token[5] {6, 1} = "去" -- UnKnown Word -- token[6] {7, 3} = "健身房" -- UnKnown Word -- NSTextView (selectionRangeForProposedRange:granularity: NSSelectByWord), AttributedString (doubleClickAtIndex:) correct as Icu. I thought that all were based on Icu, but this proves that I am wrong. Probably I should use doubleClickAtIndex, now that iOS has AttributedStrings. > let str = "สีเหลือง" > //let str = "我今天还没有去健身房" > let str2 = str as NSString > > let tagger = NSLinguisticTagger(tagSchemes: > [NSLinguisticTagSchemeTokenType], options: 0 ) > > > let range = NSMakeRange( 0, str2.length ) > > tagger.string = str2 > > var ranges : NSArray? > let things = tagger.tagsInRange( range, scheme: > NSLinguisticTagSchemeTokenType, options: NSLinguisticTaggerOptions.allZeros, > tokenRanges: &ranges ) > things.count > > ranges > > for ( index, type ) in enumerate( things ) > { > let type_range : NSValue? = ranges?[ index ] as NSValue? > print( "Type: '\(type)' at \(type_range!) ") > println( str2.substringWithRange(type_range! ) ) > > } > > _______________________________________________ Cocoa-dev mailing list (Cocoa-dev@lists.apple.com) Please do not post admin requests or moderator comments to the list. Contact the moderators at cocoa-dev-admins(at)lists.apple.com Help/Unsubscribe/Update your Subscription: https://lists.apple.com/mailman/options/cocoa-dev/archive%40mail-archive.com This email sent to arch...@mail-archive.com