Would converting each string to NFD (decomposedStringWithCanonicalMapping) be an acceptable work around in this case?
On Mon, Dec 9, 2013 at 3:43 AM, Stephen J. Butler <stephen.but...@gmail.com>wrote: > OK, you are right. Copy+paste didn't preserve the compatibility character. > Does look like a bug of sorts, or at least something a unicode expert > should explain. > > > On Mon, Dec 9, 2013 at 3:20 AM, Gerriet M. Denkmann > <gerr...@mdenkmann.de>wrote: > >> >> On 9 Dec 2013, at 16:00, Stephen J. Butler <stephen.but...@gmail.com> >> wrote: >> >> > I don't get the same result. 10.9.0, Xcode 5.0.2. I created an empty >> command line utility, copied the code, and I get NSNotFound. >> > >> > 2013-12-09 02:50:19.822 Test[73850:303] main "见≠見" (3 shorts) occurs in >> "见=見見" (4 shorts) at {9223372036854775807, 0} >> >> Copying might invoke another bug. >> Better check the characters, like: >> >> - (void)printString: (NSString *)line >> { >> NSLog(@"%s \"%@\" has characters:",__FUNCTION__, line); >> >> [ line enumerateSubstringsInRange: NSMakeRange( 0, [ line >> length ] ) >> options: >> NSStringEnumerationByComposedCharacterSequences >> usingBlock: ^(NSString *currChar, NSRange >> currCharRange, NSRange enclosingRange, BOOL *stop) >> { >> (void)enclosingRange; >> (void)stop; >> >> #ifdef __LITTLE_ENDIAN__ >> NSStringEncoding encoding >> = NSUTF32LittleEndianStringEncoding; >> #else >> NSStringEncoding encoding >> = NSUTF32BigEndianStringEncoding; >> #endif >> NSData *data = [ currChar >> dataUsingEncoding: encoding ]; >> >> NSUInteger nbrBytes = [ data >> length ]; >> NSUInteger nbrChars = nbrBytes / >> sizeof(unsigned int); >> >> if ( nbrChars * sizeof(unsigned >> int) != nbrBytes ) // error >> { >> NSLog(@"%s Error: strange >> nbr of bytes %lu",__FUNCTION__, nbrBytes); >> return; >> }; >> >> unsigned int codePoint[nbrChars]; >> [ data getBytes: &codePoint >> length: nbrBytes ]; >> >> NSMutableString *s = [ >> NSMutableString stringWithFormat: @"%@ = ", >> >> NSStringFromRange(currCharRange) >> >> ]; >> for( NSUInteger i = 0; i < >> nbrChars; i++ ) >> { >> [ s appendFormat: @"%#06x >> ", codePoint[i] ]; >> }; >> >> [ s appendFormat: @"= \"%@\"", >> currChar ]; >> >> fprintf(stderr, "%s\n", [ s >> UTF8String]); >> } >> ]; >> } >> >> and check for: >> "见=見見" has characters: >> {0, 1} = 0x89c1 = "见" >> {1, 1} = 0x003d = "=" >> {2, 1} = 0xfa0a = "見" >> {3, 1} = 0x898b = "見" >> "见≠見" has characters: >> {0, 1} = 0x89c1 = "见" >> {1, 1} = 0x2260 = "≠" >> {2, 1} = 0x898b = "見" >> >> > >> > On Mon, Dec 9, 2013 at 2:43 AM, Gerriet M. Denkmann < >> gerr...@mdenkmann.de> wrote: >> > >> > On 9 Dec 2013, at 15:05, Quincey Morris < >> quinceymor...@rivergatesoftware.com> wrote: >> > >> > > On Dec 8, 2013, at 23:46 , Gerriet M. Denkmann <gerr...@mdenkmann.de> >> wrote: >> > > >> > >> NSString *b = @"见≠見"; // 0x89c1 0x2260 0x898b >> > > >> > > So what are the results with: >> > > >> > >> NSString *b = @"见”; >> > >> NSString *b = @"≠”; >> > >> NSString *b = @"見”; >> > > ? >> > > >> > > Does specifying an explicit locale make any difference? >> > >> > Explicit specifying en_US (as probably the best tested and debugged) >> makes no difference. >> > >> >> > _______________________________________________ Cocoa-dev mailing list (Cocoa-dev@lists.apple.com) Please do not post admin requests or moderator comments to the list. Contact the moderators at cocoa-dev-admins(at)lists.apple.com Help/Unsubscribe/Update your Subscription: https://lists.apple.com/mailman/options/cocoa-dev/archive%40mail-archive.com This email sent to arch...@mail-archive.com