> On 24 Sep 2014, at 1:02 pm, Gerriet M. Denkmann <gerr...@mdenkmann.de> wrote:
> 
> 
> On 24 Sep 2014, at 11:46, Roland King <r...@rols.org> wrote:
> 
>> 
>>> On 24 Sep 2014, at 12:31 pm, Gerriet M. Denkmann <gerr...@mdenkmann.de> 
>>> wrote:
>>> 
>>> I have a problem with NSLinguisticTagger / CFStringTokenizer on iOS 8.0
>>> 
>>> OS X 10.9.5 (and iOS 7 and earlier) parses "สีเหลือง" quite rightly as two 
>>> words: "สี" = colour and "เหลือง" = yellow.
>>> 
>>> No dictionary will ever contain "yellow colour". Every dictionary will 
>>> contain "yellow" and "colour".
>>> There are hundreds, if not thousands of these expressions, which are 
>>> wrongly classified as one word.
>>> Might have something to do with the new predictive keyboard.
>>> 
>>> But I am not writing this to complain, but to ask for a favour: could 
>>> anybody on 10.10 just click anywhere in: "สีเหลือง" and tell me whether all 
>>> gets highlighted, or just a part (as in 10.9.5)?
>> 
>> 
>> If I double click anywhere on the right of that I get the second part (all 
>> bar the first character) highlighted. Clicking on the first character I get 
>> just that character. So 10.10 (beta 8) splits that sequence into two 
>> ‘words’. 
> This is a big relief. Thanks a lot.
> 
>> 
>> Why do you suspect the predictive keyboard? Certainly wouldn’t be the first 
>> thing I thought of seeing that issue. I would probably instead assume I’d 
>> written myself a bug.
> 
> Well, here is the code; maybe you can find a bug:
> 
> let text = "สีเหลือง"
> let opts: Int = 0
> let schemes = [ NSLinguisticTagSchemeTokenType, 
> NSLinguisticTagSchemeNameTypeOrLexicalClass ]
> let tagger = NSLinguisticTagger(tagSchemes: schemes, options: opts )
> 
> let nsText = text as NSString
> let length = nsText.length
> tagger.string = nsText
> let range = NSMakeRange(0,length)
> let theScheme = NSLinguisticTagSchemeTokenType
> let ops = NSLinguisticTaggerOptions(0)
> tagger.enumerateTagsInRange ( 
>       range, 
>       scheme:         theScheme, 
>       options:        ops,
>       usingBlock: 
>       {       (       tag:                    String!, 
>                       tokenRange:     NSRange, 
>                       sentenceRange:  NSRange, 
>                       stop:                   UnsafeMutablePointer<ObjCBool>
>               ) -> Void in
>               
>               let word = nsText.substringWithRange(tokenRange) 
>               println("\(tag) = \(word) " )
>       }
> )
> 
> Gerriet.
> 



Here’s my version I was just writing - I ran it in an iOS playground AND in an 
OSX playground and get the same ‘single word’ result either time. So I’m not 
entirely sure that the click test on OSX proved anything. If you comment out 
the Thai string and uncomment Chinese one, it works better and splits stuff up 
although the last two words are wrong there as well, they should be ‘去“ and 
“健身房“. It’s the same in an OSX playground and an iOS one but then again iOS 
playgrounds are emulated so .. 

I also compiled it as an OSX command line tool and it does the same thing for 
my phrase AND yours. So whatever is doing the highlighting when you ‘click’ 
isn’t the same thing NSLinguisticTagger is doing. 

The click test works on my chinese phrase too, it gets the last two words 
correct. Something sure ain’t right. 

Should write the objc version to eliminate any possibility it’s swift. 



let str = "สีเหลือง"
//let str = "我今天还没有去健身房"
let str2 = str as NSString

let tagger = NSLinguisticTagger(tagSchemes:  [NSLinguisticTagSchemeTokenType], 
options: 0 )


let range = NSMakeRange( 0, str2.length )

tagger.string = str2

var ranges : NSArray?
let things = tagger.tagsInRange( range, scheme: NSLinguisticTagSchemeTokenType, 
options: NSLinguisticTaggerOptions.allZeros, tokenRanges: &ranges )
things.count

ranges

for ( index, type ) in enumerate( things )
{
        let type_range : NSValue? = ranges?[ index ] as NSValue?
        print( "Type: '\(type)' at \(type_range!) ")
        println( str2.substringWithRange(type_range! ) )

}


_______________________________________________

Cocoa-dev mailing list (Cocoa-dev@lists.apple.com)

Please do not post admin requests or moderator comments to the list.
Contact the moderators at cocoa-dev-admins(at)lists.apple.com

Help/Unsubscribe/Update your Subscription:
https://lists.apple.com/mailman/options/cocoa-dev/archive%40mail-archive.com

This email sent to arch...@mail-archive.com

Reply via email to