On 9 Dec 2013, at 16:53, Stephen J. Butler <stephen.but...@gmail.com> wrote:

> Would converting each string to NFD (decomposedStringWithCanonicalMapping) be 
> an acceptable work around in this case?
No, it would not. I am changing all my rangeOfString calls to use 
NSLiteralSearch, which does not have these strange effects.

Gerriet.

> 
> 
> On Mon, Dec 9, 2013 at 3:43 AM, Stephen J. Butler <stephen.but...@gmail.com> 
> wrote:
> OK, you are right. Copy+paste didn't preserve the compatibility character. 
> Does look like a bug of sorts, or at least something a unicode expert should 
> explain.
> 
> 
> On Mon, Dec 9, 2013 at 3:20 AM, Gerriet M. Denkmann <gerr...@mdenkmann.de> 
> wrote:
> 
> On 9 Dec 2013, at 16:00, Stephen J. Butler <stephen.but...@gmail.com> wrote:
> 
> > I don't get the same result. 10.9.0, Xcode 5.0.2. I created an empty 
> > command line utility, copied the code, and I get NSNotFound.
> >
> > 2013-12-09 02:50:19.822 Test[73850:303] main "见≠見" (3 shorts) occurs in 
> > "见=見見" (4 shorts) at {9223372036854775807, 0}
> 
> Copying might invoke another bug.
> Better check the characters, like:
> 
> - (void)printString: (NSString *)line
> {
>         NSLog(@"%s \"%@\" has characters:",__FUNCTION__, line);
> 
>         [ line  enumerateSubstringsInRange:     NSMakeRange( 0, [ line length 
> ] )
>                                 options:                                      
>                   NSStringEnumerationByComposedCharacterSequences
>                                 usingBlock: ^(NSString *currChar, NSRange 
> currCharRange, NSRange enclosingRange, BOOL *stop)
>                                 {
>                                         (void)enclosingRange;
>                                         (void)stop;
> 
>                                         #ifdef __LITTLE_ENDIAN__
>                                                 NSStringEncoding encoding = 
> NSUTF32LittleEndianStringEncoding;
>                                         #else
>                                                 NSStringEncoding encoding = 
> NSUTF32BigEndianStringEncoding;
>                                         #endif
>                                         NSData *data = [ currChar 
> dataUsingEncoding: encoding ];
> 
>                                         NSUInteger nbrBytes = [ data length ];
>                                         NSUInteger nbrChars = nbrBytes / 
> sizeof(unsigned int);
> 
>                                         if ( nbrChars * sizeof(unsigned int) 
> != nbrBytes )      //      error
>                                         {
>                                                 NSLog(@"%s Error: strange nbr 
> of bytes %lu",__FUNCTION__, nbrBytes);
>                                                 return;
>                                         };
> 
>                                         unsigned int codePoint[nbrChars];
>                                         [ data getBytes: &codePoint  length: 
> nbrBytes ];
> 
>                                         NSMutableString *s =    [ 
> NSMutableString stringWithFormat: @"%@ = ",
>                                                                               
>                                   NSStringFromRange(currCharRange)
>                                                                               
>           ];
>                                         for( NSUInteger i = 0; i < nbrChars; 
> i++ )
>                                         {
>                                                 [ s appendFormat: @"%#06x ", 
> codePoint[i] ];
>                                         };
> 
>                                         [ s appendFormat: @"= \"%@\"", 
> currChar ];
> 
>                                         fprintf(stderr, "%s\n", [ s 
> UTF8String]);
>                                 }
>         ];
> }
> 
> and check for:
> "见=見見" has characters:
> {0, 1} = 0x89c1 = "见"
> {1, 1} = 0x003d = "="
> {2, 1} = 0xfa0a = "見"
> {3, 1} = 0x898b = "見"
> "见≠見" has characters:
> {0, 1} = 0x89c1 = "见"
> {1, 1} = 0x2260 = "≠"
> {2, 1} = 0x898b = "見"
> 
> >
> > On Mon, Dec 9, 2013 at 2:43 AM, Gerriet M. Denkmann <gerr...@mdenkmann.de> 
> > wrote:
> >
> > On 9 Dec 2013, at 15:05, Quincey Morris 
> > <quinceymor...@rivergatesoftware.com> wrote:
> >
> > > On Dec 8, 2013, at 23:46 , Gerriet M. Denkmann <gerr...@mdenkmann.de> 
> > > wrote:
> > >
> > >> NSString *b = @"见≠見";                //      0x89c1  0x2260  0x898b
> > >
> > > So what are the results with:
> > >
> > >> NSString *b = @"见”;
> > >> NSString *b = @"≠”;
> > >> NSString *b = @"見”;
> > > ?
> > >
> > >  Does specifying an explicit locale make any difference?
> >
> > Explicit specifying en_US (as probably the best tested and debugged) makes 
> > no difference.
> >
> 
> 
> 


_______________________________________________

Cocoa-dev mailing list (Cocoa-dev@lists.apple.com)

Please do not post admin requests or moderator comments to the list.
Contact the moderators at cocoa-dev-admins(at)lists.apple.com

Help/Unsubscribe/Update your Subscription:
https://lists.apple.com/mailman/options/cocoa-dev/archive%40mail-archive.com

This email sent to arch...@mail-archive.com

Reply via email to