> On 7 Apr 2015, at 00:15, Quincey Morris <quinceymor...@rivergatesoftware.com> 
> wrote:
> 
> On Apr 6, 2015, at 09:19 , Gerriet M. Denkmann <gerr...@mdenkmann.de> wrote:
>> 
>> Where is my bicycle gone? What am I doing wrong?
> 
> Before this thread heads further into outer space…
> 
> I suspect it [NSCharacterSet] is just broken. Look here, for example:
> 
>       
> http://stackoverflow.com/questions/23000812/creating-nscharacterset-with-unicode-smp-entries-testing-membership-is-this
> 
> The problem is that it’s unclear whether the “characters” in NSCharacterSet 
> are internally UTF-16 code units, UTF-32 code units, Unicode code points, or 
> something else. According to the NSCharacterSet documentation:
> 
>> "An NSCharacterSet object represents a set of Unicode-compliant characters.”
> 
> and:
> 
>> "The NSCharacterSet class declares the programmatic interface for an object 
>> that manages a set of Unicode characters (see the NSString class cluster 
>> specification for information on Unicode).”
> 
> According the NSString documentation:
> 
>> "A string object presents itself as an array of Unicode characters (Unicode 
>> is a registered trademark of Unicode, Inc.). You can determine how many 
>> characters a string object contains with the length method and can retrieve 
>> a specific character with the characterAtIndex: method.”
> 
> Working backwards, we know that the characters that are counted by -[NSString 
> length]’ are UTF-16 code units, so this all *possibly* implies that 
> NSCharacterSet characters are UTF-16 code units, too. Plus, back in 
> NSCharacterSet documentation:
> 
>> "NSCharacterSet’s principal primitive method, characterIsMember:, provides 
>> the basis for all other instance methods in its interface.”
> 
> If that’s true, ‘longCharacterIsMember:’ is pretty much screwed.
> 
> Perhaps the NSCharacterSet documentation is just wrong. Or perhaps, when the 
> API was enhanced in 10.2 (see: 
> http://www.cocoabuilder.com/archive/cocoa/73297-working-with-32-bit-unicode-nsstring-stringwithutf32string-const-utf32char-bytes-needed.html,
>  for some tantalizing hints about NSCharacterSet), the implementation was a 
> hack that works somehow but isn’t documented. I don’t think you’re going to 
> get any definitive answer except directly from Apple.
> 
> A suggestion, though:
> 
> Try building your character set using ‘characterSetWithRange:’ and/or the 
> NSMutableCharacterSet methods that add ranges, instead of using NSStrings. 
> Maybe NSCharacterSet really is UTF-32-based, but not — for code compatibility 
> reasons — when using NSStrings explicitly.

1. longCharacterIsMember seems to be ok:
                NSCharacterSet *alphanumericCharacterSet = [ NSCharacterSet 
alphanumericCharacterSet ];
                BOOL pp = [ alphanumericCharacterSet longCharacterIsMember: 
0x2f800 ];
returns YES as it should.

2. characterSetWithCharactersInString seems to take only the lower 16 bits of 
the code points in the string. Bug.
Works ok though, if all chars in the string have code points ≥ 0x10000 (e.g. 
"𝄞🚲")

3. the documentation about bitmapRepresentation  is wrong. It says: "A raw 
bitmap representation of a character set is a byte array of 2^16 bits (that is, 
8192 bytes)."
But alphanumericCharacterSet has a bitmap with 32771 = 0x8003 bytes, which 
mostly look ok.
It has some strange things though at the end: 
0x2fa1e → 0x2fa2d 
0x30011 → 0x30207 
which I do not recognise as alphanumeric.

4. characterSetWithRange works a bit better:
        NSCharacterSet *a = [ NSCharacterSet characterSetWithRange: 
NSMakeRange(0x1F6B2,1) ];
        BOOL pp = [ a longCharacterIsMember: 0x1F6B2 ]; → returns YES as it 
should.

But when I look at the bitmapRepresentation I see 16385 bytes with two bits 
set: 0x10000 and 0x1f6ba (8 bits off)

Looks like the format of the bitmapRepresentation is slightly more complex than 
documented.


Kind regards,

Gerriet.


_______________________________________________

Cocoa-dev mailing list (Cocoa-dev@lists.apple.com)

Please do not post admin requests or moderator comments to the list.
Contact the moderators at cocoa-dev-admins(at)lists.apple.com

Help/Unsubscribe/Update your Subscription:
https://lists.apple.com/mailman/options/cocoa-dev/archive%40mail-archive.com

This email sent to arch...@mail-archive.com

Reply via email to