Re: Where is my bicycle?

2015-04-07 Thread Gerriet M. Denkmann

 On 7 Apr 2015, at 00:15, Quincey Morris quinceymor...@rivergatesoftware.com 
 wrote:
 
 On Apr 6, 2015, at 09:19 , Gerriet M. Denkmann gerr...@mdenkmann.de wrote:
 
 A suggestion, though:
 
 Try building your character set using ‘characterSetWithRange:’ and/or the 
 NSMutableCharacterSet methods that add ranges, instead of using NSStrings. 
 Maybe NSCharacterSet really is UTF-32-based, but not — for code compatibility 
 reasons — when using NSStrings explicitly.

This turned out to be an excellent idea - it allowed me to create a replacement 
for characterSetWithCharactersInString: which actually works:

//  bug work-around
+ (NSCharacterSet *)gmdCharacterSetWithCharactersInString: (NSString *)string
{
if ( string.length == 0 )   //  return nil
{
NSLog(@%s string \%@\ is empty or nil → no 
CharacterSet.,__FUNCTION__, string);
return nil;
};

NSData *dat = [ string dataUsingEncoding: NSUTF32StringEncoding ];
const UTF32Char *bytes = dat.bytes;
NSUInteger length = dat.length / sizeof(UTF32Char);

NSMutableCharacterSet *mus = [ [ NSMutableCharacterSet alloc ] init ];
for( NSUInteger i = 1; i  length; i++ )//  ignore initial 
kUnicodeByteOrderMark
{
UTF32Char codePoint = bytes[i];
[ mus addCharactersInRange: NSMakeRange( codePoint, 1 ) ];
};

return mus;
}

Thanks very much for your suggestion!


Kind regards,

Gerriet.


___

Cocoa-dev mailing list (Cocoa-dev@lists.apple.com)

Please do not post admin requests or moderator comments to the list.
Contact the moderators at cocoa-dev-admins(at)lists.apple.com

Help/Unsubscribe/Update your Subscription:
https://lists.apple.com/mailman/options/cocoa-dev/archive%40mail-archive.com

This email sent to arch...@mail-archive.com

Re: Where is my bicycle?

2015-04-07 Thread Quincey Morris
On Apr 7, 2015, at 02:21 , Gerriet M. Denkmann gerr...@mdenkmann.de wrote:
 
 it allowed me to create a replacement for characterSetWithCharactersInString: 
 which actually works

The only suggestion I have is to return ‘mus.copy’ instead of ‘mus’.

Given that we know NSCharacterSet has some optimized internal representations, 
it’s possible that NSMutableCharacterSet doesn’t use them, since there’s no 
point until you’re “finished” mutating. If you’re using a wide range of UTF-32 
values, the mutable object might be quite large, and taking a immutable copy 
might produce a very much smaller object.

Or not.



___

Cocoa-dev mailing list (Cocoa-dev@lists.apple.com)

Please do not post admin requests or moderator comments to the list.
Contact the moderators at cocoa-dev-admins(at)lists.apple.com

Help/Unsubscribe/Update your Subscription:
https://lists.apple.com/mailman/options/cocoa-dev/archive%40mail-archive.com

This email sent to arch...@mail-archive.com

Re: Where is my bicycle?

2015-04-07 Thread Charles Srstka
On Apr 7, 2015, at 2:24 PM, Jens Alfke j...@mooseyard.com wrote:
 
 This is the same process that allows you to put Japanese or Cyrillic 
 characters in a string and render them in Helvetica or Papyrus even though 
 those fonts don’t support those character sets.

I really want to see a Cyrillic version of Papyrus now. ;-)

Charles

___

Cocoa-dev mailing list (Cocoa-dev@lists.apple.com)

Please do not post admin requests or moderator comments to the list.
Contact the moderators at cocoa-dev-admins(at)lists.apple.com

Help/Unsubscribe/Update your Subscription:
https://lists.apple.com/mailman/options/cocoa-dev/archive%40mail-archive.com

This email sent to arch...@mail-archive.com

Re: Where is my bicycle?

2015-04-07 Thread Jens Alfke

 On Apr 7, 2015, at 12:59 PM, Charles Srstka cocoa...@charlessoft.com wrote:
 
 I really want to see a Cyrillic version of Papyrus now. ;-)

http://ihateyouare.deviantart.com/art/Papyrus-Plain-Cyrillic-165111766 
http://ihateyouare.deviantart.com/art/Papyrus-Plain-Cyrillic-165111766

You’re welcome :)

—Jens
___

Cocoa-dev mailing list (Cocoa-dev@lists.apple.com)

Please do not post admin requests or moderator comments to the list.
Contact the moderators at cocoa-dev-admins(at)lists.apple.com

Help/Unsubscribe/Update your Subscription:
https://lists.apple.com/mailman/options/cocoa-dev/archive%40mail-archive.com

This email sent to arch...@mail-archive.com

Re: Where is my bicycle?

2015-04-07 Thread Jens Alfke

 On Apr 6, 2015, at 2:09 PM, Jack Brindle jackbrin...@me.com wrote:
 
 Have you checked the Font you are using to display the character string to 
 see if it contains the bicycle character? If not, you probably won’t get the 
 character you seek.

Fonts have nothing to do with it; they’re an aspect of rendering text, not of 
working with the text in memory. (If it weren’t this way, you wouldn’t be able 
to work with NSString at all; everything would have to be based on 
NSAttributedString to carry around the font info for every character.)

The bicycle is a well-defined Unicode character, an emoji. When it comes time 
to render it, the typesetter will look for a glyph in the current font for that 
character code. It probably won’t find one, so it will go through a series of 
fallback fonts looking for a glyph until it finds one in whatever internal font 
stores the emoji glyphs. Then it uses that font to render it.

This is the same process that allows you to put Japanese or Cyrillic characters 
in a string and render them in Helvetica or Papyrus even though those fonts 
don’t support those character sets. They’re actually being rendered in whatever 
system font is the default for those character sets. This is all invisible to 
you unless you start diving down into the NSTypesetter or CoreText APIs.

—Jens
___

Cocoa-dev mailing list (Cocoa-dev@lists.apple.com)

Please do not post admin requests or moderator comments to the list.
Contact the moderators at cocoa-dev-admins(at)lists.apple.com

Help/Unsubscribe/Update your Subscription:
https://lists.apple.com/mailman/options/cocoa-dev/archive%40mail-archive.com

This email sent to arch...@mail-archive.com

Re: Where is my bicycle?

2015-04-06 Thread Quincey Morris
On Apr 6, 2015, at 09:19 , Gerriet M. Denkmann gerr...@mdenkmann.de wrote:
 
 Where is my bicycle gone? What am I doing wrong?

Before this thread heads further into outer space…

I suspect it [NSCharacterSet] is just broken. Look here, for example:


http://stackoverflow.com/questions/23000812/creating-nscharacterset-with-unicode-smp-entries-testing-membership-is-this
 
http://stackoverflow.com/questions/23000812/creating-nscharacterset-with-unicode-smp-entries-testing-membership-is-this

The problem is that it’s unclear whether the “characters” in NSCharacterSet are 
internally UTF-16 code units, UTF-32 code units, Unicode code points, or 
something else. According to the NSCharacterSet documentation:

 An NSCharacterSet object represents a set of Unicode-compliant characters.”


and:

 The NSCharacterSet class declares the programmatic interface for an object 
 that manages a set of Unicode characters (see the NSString class cluster 
 specification for information on Unicode).”


According the NSString documentation:

 A string object presents itself as an array of Unicode characters (Unicode 
 is a registered trademark of Unicode, Inc.). You can determine how many 
 characters a string object contains with the length method and can retrieve a 
 specific character with the characterAtIndex: method.”


Working backwards, we know that the characters that are counted by -[NSString 
length]’ are UTF-16 code units, so this all *possibly* implies that 
NSCharacterSet characters are UTF-16 code units, too. Plus, back in 
NSCharacterSet documentation:

 NSCharacterSet’s principal primitive method, characterIsMember:, provides 
 the basis for all other instance methods in its interface.”


If that’s true, ‘longCharacterIsMember:’ is pretty much screwed.

Perhaps the NSCharacterSet documentation is just wrong. Or perhaps, when the 
API was enhanced in 10.2 (see: 
http://www.cocoabuilder.com/archive/cocoa/73297-working-with-32-bit-unicode-nsstring-stringwithutf32string-const-utf32char-bytes-needed.html
 
http://www.cocoabuilder.com/archive/cocoa/73297-working-with-32-bit-unicode-nsstring-stringwithutf32string-const-utf32char-bytes-needed.html,
 for some tantalizing hints about NSCharacterSet), the implementation was a 
hack that works somehow but isn’t documented. I don’t think you’re going to get 
any definitive answer except directly from Apple.

A suggestion, though:

Try building your character set using ‘characterSetWithRange:’ and/or the 
NSMutableCharacterSet methods that add ranges, instead of using NSStrings. 
Maybe NSCharacterSet really is UTF-32-based, but not — for code compatibility 
reasons — when using NSStrings explicitly.




___

Cocoa-dev mailing list (Cocoa-dev@lists.apple.com)

Please do not post admin requests or moderator comments to the list.
Contact the moderators at cocoa-dev-admins(at)lists.apple.com

Help/Unsubscribe/Update your Subscription:
https://lists.apple.com/mailman/options/cocoa-dev/archive%40mail-archive.com

This email sent to arch...@mail-archive.com

Re: Where is my bicycle?

2015-04-06 Thread Paul Scott
On Apr 6, 2015, at 9:57 AM, Charles Srstka cocoa...@charlessoft.com wrote:
 
 The problem, then, is likely the fact that NSCharacterSet considers a 
 “character” simply as a UTF-16 code point, rather than a true Unicode 
 character as Swift does.

That should not matter. UTF-16 is a variable length encoding. It is guaranteed 
to support all 1,112,064 possible Unicode characters. In order to do that it 
MUST be variable length, either 2-octets or 4-octets.

This appears to be a bug in the Objective-C handling of UTF-16.

smime.p7s
Description: S/MIME cryptographic signature
___

Cocoa-dev mailing list (Cocoa-dev@lists.apple.com)

Please do not post admin requests or moderator comments to the list.
Contact the moderators at cocoa-dev-admins(at)lists.apple.com

Help/Unsubscribe/Update your Subscription:
https://lists.apple.com/mailman/options/cocoa-dev/archive%40mail-archive.com

This email sent to arch...@mail-archive.com

Re: Where is my bicycle?

2015-04-06 Thread Gerriet M. Denkmann

 On 7 Apr 2015, at 00:15, Quincey Morris quinceymor...@rivergatesoftware.com 
 wrote:
 
 On Apr 6, 2015, at 09:19 , Gerriet M. Denkmann gerr...@mdenkmann.de wrote:
 
 Where is my bicycle gone? What am I doing wrong?
 
 Before this thread heads further into outer space…
 
 I suspect it [NSCharacterSet] is just broken. Look here, for example:
 
   
 http://stackoverflow.com/questions/23000812/creating-nscharacterset-with-unicode-smp-entries-testing-membership-is-this
 
 The problem is that it’s unclear whether the “characters” in NSCharacterSet 
 are internally UTF-16 code units, UTF-32 code units, Unicode code points, or 
 something else. According to the NSCharacterSet documentation:
 
 An NSCharacterSet object represents a set of Unicode-compliant characters.”
 
 and:
 
 The NSCharacterSet class declares the programmatic interface for an object 
 that manages a set of Unicode characters (see the NSString class cluster 
 specification for information on Unicode).”
 
 According the NSString documentation:
 
 A string object presents itself as an array of Unicode characters (Unicode 
 is a registered trademark of Unicode, Inc.). You can determine how many 
 characters a string object contains with the length method and can retrieve 
 a specific character with the characterAtIndex: method.”
 
 Working backwards, we know that the characters that are counted by -[NSString 
 length]’ are UTF-16 code units, so this all *possibly* implies that 
 NSCharacterSet characters are UTF-16 code units, too. Plus, back in 
 NSCharacterSet documentation:
 
 NSCharacterSet’s principal primitive method, characterIsMember:, provides 
 the basis for all other instance methods in its interface.”
 
 If that’s true, ‘longCharacterIsMember:’ is pretty much screwed.
 
 Perhaps the NSCharacterSet documentation is just wrong. Or perhaps, when the 
 API was enhanced in 10.2 (see: 
 http://www.cocoabuilder.com/archive/cocoa/73297-working-with-32-bit-unicode-nsstring-stringwithutf32string-const-utf32char-bytes-needed.html,
  for some tantalizing hints about NSCharacterSet), the implementation was a 
 hack that works somehow but isn’t documented. I don’t think you’re going to 
 get any definitive answer except directly from Apple.
 
 A suggestion, though:
 
 Try building your character set using ‘characterSetWithRange:’ and/or the 
 NSMutableCharacterSet methods that add ranges, instead of using NSStrings. 
 Maybe NSCharacterSet really is UTF-32-based, but not — for code compatibility 
 reasons — when using NSStrings explicitly.

1. longCharacterIsMember seems to be ok:
NSCharacterSet *alphanumericCharacterSet = [ NSCharacterSet 
alphanumericCharacterSet ];
BOOL pp = [ alphanumericCharacterSet longCharacterIsMember: 
0x2f800 ];
returns YES as it should.

2. characterSetWithCharactersInString seems to take only the lower 16 bits of 
the code points in the string. Bug.
Works ok though, if all chars in the string have code points ≥ 0x1 (e.g. 
턞)

3. the documentation about bitmapRepresentation  is wrong. It says: A raw 
bitmap representation of a character set is a byte array of 2^16 bits (that is, 
8192 bytes).
But alphanumericCharacterSet has a bitmap with 32771 = 0x8003 bytes, which 
mostly look ok.
It has some strange things though at the end: 
0x2fa1e → 0x2fa2d 
0x30011 → 0x30207 
which I do not recognise as alphanumeric.

4. characterSetWithRange works a bit better:
NSCharacterSet *a = [ NSCharacterSet characterSetWithRange: 
NSMakeRange(0x1F6B2,1) ];
BOOL pp = [ a longCharacterIsMember: 0x1F6B2 ]; → returns YES as it 
should.

But when I look at the bitmapRepresentation I see 16385 bytes with two bits 
set: 0x1 and 0x1f6ba (8 bits off)

Looks like the format of the bitmapRepresentation is slightly more complex than 
documented.


Kind regards,

Gerriet.


___

Cocoa-dev mailing list (Cocoa-dev@lists.apple.com)

Please do not post admin requests or moderator comments to the list.
Contact the moderators at cocoa-dev-admins(at)lists.apple.com

Help/Unsubscribe/Update your Subscription:
https://lists.apple.com/mailman/options/cocoa-dev/archive%40mail-archive.com

This email sent to arch...@mail-archive.com

Re: Where is my bicycle?

2015-04-06 Thread Greg Parker

 On Apr 6, 2015, at 10:15 AM, Quincey Morris 
 quinceymor...@rivergatesoftware.com wrote:
 
 On Apr 6, 2015, at 09:19 , Gerriet M. Denkmann gerr...@mdenkmann.de wrote:
 
 Where is my bicycle gone? What am I doing wrong?
 
 The problem is that it’s unclear whether the “characters” in NSCharacterSet 
 are internally UTF-16 code units, UTF-32 code units, Unicode code points, or 
 something else. According to the NSCharacterSet documentation:

I'm not an expert here, but my understanding is that when Cocoa says 
character it usually means UTF-16 code unit. @.length == 2, for example. 
Cocoa's string API designed when Unicode was still a true 16-bit character set.


-- 
Greg Parker gpar...@apple.com Runtime Wrangler



___

Cocoa-dev mailing list (Cocoa-dev@lists.apple.com)

Please do not post admin requests or moderator comments to the list.
Contact the moderators at cocoa-dev-admins(at)lists.apple.com

Help/Unsubscribe/Update your Subscription:
https://lists.apple.com/mailman/options/cocoa-dev/archive%40mail-archive.com

This email sent to arch...@mail-archive.com

Re: Where is my bicycle?

2015-04-06 Thread Quincey Morris
On Apr 6, 2015, at 12:29 , Greg Parker gpar...@apple.com wrote:
 
 my understanding is that when Cocoa says character it usually means UTF-16 
 code unit. @.length == 2, for example. Cocoa's string API designed when 
 Unicode was still a true 16-bit character set.

I would have said so, too, except that NSCharacterSet has this 
‘longCharacterIsMember: (UTF32Char)’ API, which seems inexplicable if the 
parameter is a UTF-16 code unit, since that’s what ‘characterIsMember: 
(unichar)’ is apparently for.



___

Cocoa-dev mailing list (Cocoa-dev@lists.apple.com)

Please do not post admin requests or moderator comments to the list.
Contact the moderators at cocoa-dev-admins(at)lists.apple.com

Help/Unsubscribe/Update your Subscription:
https://lists.apple.com/mailman/options/cocoa-dev/archive%40mail-archive.com

This email sent to arch...@mail-archive.com

Re: Where is my bicycle?

2015-04-06 Thread Jack Brindle
Have you checked the Font you are using to display the character string to see 
if it contains the bicycle character? If not, you probably won’t get the 
character you seek.

- Jack

 On Apr 6, 2015, at 11:15 AM, Gerriet M. Denkmann gerr...@mdenkmann.de wrote:
 
 
 On 7 Apr 2015, at 00:15, Quincey Morris 
 quinceymor...@rivergatesoftware.com wrote:
 
 On Apr 6, 2015, at 09:19 , Gerriet M. Denkmann gerr...@mdenkmann.de wrote:
 
 Where is my bicycle gone? What am I doing wrong?
 
 Before this thread heads further into outer space…
 
 I suspect it [NSCharacterSet] is just broken. Look here, for example:
 
  
 http://stackoverflow.com/questions/23000812/creating-nscharacterset-with-unicode-smp-entries-testing-membership-is-this
 
 The problem is that it’s unclear whether the “characters” in NSCharacterSet 
 are internally UTF-16 code units, UTF-32 code units, Unicode code points, or 
 something else. According to the NSCharacterSet documentation:
 
 An NSCharacterSet object represents a set of Unicode-compliant characters.”
 
 and:
 
 The NSCharacterSet class declares the programmatic interface for an object 
 that manages a set of Unicode characters (see the NSString class cluster 
 specification for information on Unicode).”
 
 According the NSString documentation:
 
 A string object presents itself as an array of Unicode characters (Unicode 
 is a registered trademark of Unicode, Inc.). You can determine how many 
 characters a string object contains with the length method and can retrieve 
 a specific character with the characterAtIndex: method.”
 
 Working backwards, we know that the characters that are counted by 
 -[NSString length]’ are UTF-16 code units, so this all *possibly* implies 
 that NSCharacterSet characters are UTF-16 code units, too. Plus, back in 
 NSCharacterSet documentation:
 
 NSCharacterSet’s principal primitive method, characterIsMember:, provides 
 the basis for all other instance methods in its interface.”
 
 If that’s true, ‘longCharacterIsMember:’ is pretty much screwed.
 
 Perhaps the NSCharacterSet documentation is just wrong. Or perhaps, when the 
 API was enhanced in 10.2 (see: 
 http://www.cocoabuilder.com/archive/cocoa/73297-working-with-32-bit-unicode-nsstring-stringwithutf32string-const-utf32char-bytes-needed.html,
  for some tantalizing hints about NSCharacterSet), the implementation was a 
 hack that works somehow but isn’t documented. I don’t think you’re going to 
 get any definitive answer except directly from Apple.
 
 A suggestion, though:
 
 Try building your character set using ‘characterSetWithRange:’ and/or the 
 NSMutableCharacterSet methods that add ranges, instead of using NSStrings. 
 Maybe NSCharacterSet really is UTF-32-based, but not — for code 
 compatibility reasons — when using NSStrings explicitly.
 
 1. longCharacterIsMember seems to be ok:
   NSCharacterSet *alphanumericCharacterSet = [ NSCharacterSet 
 alphanumericCharacterSet ];
   BOOL pp = [ alphanumericCharacterSet longCharacterIsMember: 
 0x2f800 ];
 returns YES as it should.
 
 2. characterSetWithCharactersInString seems to take only the lower 16 bits of 
 the code points in the string. Bug.
 Works ok though, if all chars in the string have code points ≥ 0x1 (e.g. 
 턞)
 
 3. the documentation about bitmapRepresentation  is wrong. It says: A raw 
 bitmap representation of a character set is a byte array of 2^16 bits (that 
 is, 8192 bytes).
 But alphanumericCharacterSet has a bitmap with 32771 = 0x8003 bytes, which 
 mostly look ok.
 It has some strange things though at the end: 
 0x2fa1e → 0x2fa2d 
 0x30011 → 0x30207 
 which I do not recognise as alphanumeric.
 
 4. characterSetWithRange works a bit better:
   NSCharacterSet *a = [ NSCharacterSet characterSetWithRange: 
 NSMakeRange(0x1F6B2,1) ];
   BOOL pp = [ a longCharacterIsMember: 0x1F6B2 ]; → returns YES as it 
 should.
 
 But when I look at the bitmapRepresentation I see 16385 bytes with two bits 
 set: 0x1 and 0x1f6ba (8 bits off)
 
 Looks like the format of the bitmapRepresentation is slightly more complex 
 than documented.
 
 
 Kind regards,
 
 Gerriet.
 
 
 ___
 
 Cocoa-dev mailing list (Cocoa-dev@lists.apple.com)
 
 Please do not post admin requests or moderator comments to the list.
 Contact the moderators at cocoa-dev-admins(at)lists.apple.com
 
 Help/Unsubscribe/Update your Subscription:
 https://lists.apple.com/mailman/options/cocoa-dev/jackbrindle%40me.com
 
 This email sent to jackbrin...@me.com


___

Cocoa-dev mailing list (Cocoa-dev@lists.apple.com)

Please do not post admin requests or moderator comments to the list.
Contact the moderators at cocoa-dev-admins(at)lists.apple.com

Help/Unsubscribe/Update your Subscription:
https://lists.apple.com/mailman/options/cocoa-dev/archive%40mail-archive.com

This email sent to arch...@mail-archive.com

Re: Where is my bicycle?

2015-04-06 Thread Michael Crawford
If you're unable to do what you need with Cocoa, maybe it would work to use ICU.


Michael David Crawford, Consulting Software Engineer
mdcrawf...@gmail.com
http://www.warplife.com/mdc/

   Available for Software Development in the Portland, Oregon Metropolitan
Area.


On Mon, Apr 6, 2015 at 4:57 PM, Quincey Morris
quinceymor...@rivergatesoftware.com wrote:
 On Apr 6, 2015, at 16:29 , pscott psc...@skycoast.us wrote:

 But what you were describing *would* be UCS-2. To claim UTF-16 support, 
 variable length encoding must be handled.

 It's pretty much understood -- on this list -- that NSString is based on 
 UTF-16, so we tend to cut the corner that's bothering you. This is 
 complicated by the fact that NSString is a bit weird. Its underlying 
 representation is UTF-16 strings, but its API is array of UTF-16 code 
 units. That means you can create an invalid UTF-16 string with the NSString 
 API. The fact that we're not supposed to do that is also pretty much 
 understood.

 This messiness, along with the use of the ambiguous word character or 
 Unicode character in the documentation, is all for historical reasons.

 NSCharacterSet is something else again. We don't actually know whether:

 -- it's implemented as a set of UTF-16 code units, instead of code points

 -- it handles UTF-16 surrogate pairs properly, in which of its API methods

 -- it handles UTF-32 code units properly, in which of its API methods

 -- it has bugs that prevent it from doing what it's intended to do, whatever 
 that is

 Greg has basically given us the answers: not code units, possibly, it's 
 supposed to, and probably. :)



 ___

 Cocoa-dev mailing list (Cocoa-dev@lists.apple.com)

 Please do not post admin requests or moderator comments to the list.
 Contact the moderators at cocoa-dev-admins(at)lists.apple.com

 Help/Unsubscribe/Update your Subscription:
 https://lists.apple.com/mailman/options/cocoa-dev/mdcrawford%40gmail.com

 This email sent to mdcrawf...@gmail.com

___

Cocoa-dev mailing list (Cocoa-dev@lists.apple.com)

Please do not post admin requests or moderator comments to the list.
Contact the moderators at cocoa-dev-admins(at)lists.apple.com

Help/Unsubscribe/Update your Subscription:
https://lists.apple.com/mailman/options/cocoa-dev/archive%40mail-archive.com

This email sent to arch...@mail-archive.com

Re: Where is my bicycle?

2015-04-06 Thread Michael Crawford
Your bicycle showed up in my GMail in Firefox on Yosemite, but not in
Safari on my Mom's iMac running Tiger.
Michael David Crawford, Consulting Software Engineer
mdcrawf...@gmail.com
http://www.warplife.com/mdc/

   Available for Software Development in the Portland, Oregon Metropolitan
Area.


On Mon, Apr 6, 2015 at 2:09 PM, Jack Brindle jackbrin...@me.com wrote:
 Have you checked the Font you are using to display the character string to 
 see if it contains the bicycle character? If not, you probably won’t get the 
 character you seek.

 - Jack

 On Apr 6, 2015, at 11:15 AM, Gerriet M. Denkmann gerr...@mdenkmann.de 
 wrote:


 On 7 Apr 2015, at 00:15, Quincey Morris 
 quinceymor...@rivergatesoftware.com wrote:

 On Apr 6, 2015, at 09:19 , Gerriet M. Denkmann gerr...@mdenkmann.de wrote:

 Where is my bicycle gone? What am I doing wrong?

 Before this thread heads further into outer space…

 I suspect it [NSCharacterSet] is just broken. Look here, for example:

  
 http://stackoverflow.com/questions/23000812/creating-nscharacterset-with-unicode-smp-entries-testing-membership-is-this

 The problem is that it’s unclear whether the “characters” in NSCharacterSet 
 are internally UTF-16 code units, UTF-32 code units, Unicode code points, 
 or something else. According to the NSCharacterSet documentation:

 An NSCharacterSet object represents a set of Unicode-compliant 
 characters.”

 and:

 The NSCharacterSet class declares the programmatic interface for an 
 object that manages a set of Unicode characters (see the NSString class 
 cluster specification for information on Unicode).”

 According the NSString documentation:

 A string object presents itself as an array of Unicode characters 
 (Unicode is a registered trademark of Unicode, Inc.). You can determine 
 how many characters a string object contains with the length method and 
 can retrieve a specific character with the characterAtIndex: method.”

 Working backwards, we know that the characters that are counted by 
 -[NSString length]’ are UTF-16 code units, so this all *possibly* implies 
 that NSCharacterSet characters are UTF-16 code units, too. Plus, back in 
 NSCharacterSet documentation:

 NSCharacterSet’s principal primitive method, characterIsMember:, provides 
 the basis for all other instance methods in its interface.”

 If that’s true, ‘longCharacterIsMember:’ is pretty much screwed.

 Perhaps the NSCharacterSet documentation is just wrong. Or perhaps, when 
 the API was enhanced in 10.2 (see: 
 http://www.cocoabuilder.com/archive/cocoa/73297-working-with-32-bit-unicode-nsstring-stringwithutf32string-const-utf32char-bytes-needed.html,
  for some tantalizing hints about NSCharacterSet), the implementation was a 
 hack that works somehow but isn’t documented. I don’t think you’re going to 
 get any definitive answer except directly from Apple.

 A suggestion, though:

 Try building your character set using ‘characterSetWithRange:’ and/or the 
 NSMutableCharacterSet methods that add ranges, instead of using NSStrings. 
 Maybe NSCharacterSet really is UTF-32-based, but not — for code 
 compatibility reasons — when using NSStrings explicitly.

 1. longCharacterIsMember seems to be ok:
   NSCharacterSet *alphanumericCharacterSet = [ NSCharacterSet 
 alphanumericCharacterSet ];
   BOOL pp = [ alphanumericCharacterSet longCharacterIsMember: 
 0x2f800 ];
 returns YES as it should.

 2. characterSetWithCharactersInString seems to take only the lower 16 bits 
 of the code points in the string. Bug.
 Works ok though, if all chars in the string have code points ≥ 0x1 (e.g. 
 턞)

 3. the documentation about bitmapRepresentation  is wrong. It says: A raw 
 bitmap representation of a character set is a byte array of 2^16 bits (that 
 is, 8192 bytes).
 But alphanumericCharacterSet has a bitmap with 32771 = 0x8003 bytes, which 
 mostly look ok.
 It has some strange things though at the end:
 0x2fa1e → 0x2fa2d
 0x30011 → 0x30207
 which I do not recognise as alphanumeric.

 4. characterSetWithRange works a bit better:
   NSCharacterSet *a = [ NSCharacterSet characterSetWithRange: 
 NSMakeRange(0x1F6B2,1) ];
   BOOL pp = [ a longCharacterIsMember: 0x1F6B2 ]; → returns YES as it 
 should.

 But when I look at the bitmapRepresentation I see 16385 bytes with two bits 
 set: 0x1 and 0x1f6ba (8 bits off)

 Looks like the format of the bitmapRepresentation is slightly more complex 
 than documented.


 Kind regards,

 Gerriet.


 ___

 Cocoa-dev mailing list (Cocoa-dev@lists.apple.com)

 Please do not post admin requests or moderator comments to the list.
 Contact the moderators at cocoa-dev-admins(at)lists.apple.com

 Help/Unsubscribe/Update your Subscription:
 https://lists.apple.com/mailman/options/cocoa-dev/jackbrindle%40me.com

 This email sent to jackbrin...@me.com


 ___

 Cocoa-dev mailing list (Cocoa-dev@lists.apple.com)

 

Re: Where is my bicycle?

2015-04-06 Thread Greg Parker

 On Apr 6, 2015, at 2:20 PM, pscott psc...@skycoast.us wrote:
 
 On 4/6/2015 12:29 PM, Greg Parker wrote:
 I'm not an expert here, but my understanding is that when Cocoa says 
 character it usually means UTF-16 code unit. @.length == 2, for 
 example. Cocoa's string API designed when Unicode was still a true 16-bit 
 character set.
 
 That would be UCS-2 encoding. If the full Unicode character set of 1,112,064 
 characters isn't supported it should not be documented as supporting UTF-16.

No, it's not UCS-2. The API generally works as if it were manipulating an array 
of UTF-16 code units. @ displays correctly; it would not if the system were 
truly UCS-2. 


-- 
Greg Parker gpar...@apple.com Runtime Wrangler



___

Cocoa-dev mailing list (Cocoa-dev@lists.apple.com)

Please do not post admin requests or moderator comments to the list.
Contact the moderators at cocoa-dev-admins(at)lists.apple.com

Help/Unsubscribe/Update your Subscription:
https://lists.apple.com/mailman/options/cocoa-dev/archive%40mail-archive.com

This email sent to arch...@mail-archive.com

Re: Where is my bicycle?

2015-04-06 Thread pscott

On 4/6/2015 12:29 PM, Greg Parker wrote:
I'm not an expert here, but my understanding is that when Cocoa says 
character it usually means UTF-16 code unit. @.length == 2, 
for example. Cocoa's string API designed when Unicode was still a true 
16-bit character set.


That would be UCS-2 encoding. If the full Unicode character set of 
1,112,064 characters isn't supported it should not be documented as 
supporting UTF-16.


Paul




smime.p7s
Description: S/MIME Cryptographic Signature
___

Cocoa-dev mailing list (Cocoa-dev@lists.apple.com)

Please do not post admin requests or moderator comments to the list.
Contact the moderators at cocoa-dev-admins(at)lists.apple.com

Help/Unsubscribe/Update your Subscription:
https://lists.apple.com/mailman/options/cocoa-dev/archive%40mail-archive.com

This email sent to arch...@mail-archive.com

Re: Where is my bicycle?

2015-04-06 Thread Greg Parker

 On Apr 6, 2015, at 11:15 AM, Gerriet M. Denkmann gerr...@mdenkmann.de wrote:
 
 2. characterSetWithCharactersInString seems to take only the lower 16 bits of 
 the code points in the string. Bug.
 Works ok though, if all chars in the string have code points ≥ 0x1 (e.g. 
 턞)

The implementation of +characterSetWithCharactersInString: does attempt to 
handle arbitrary code points. It tries to optimize strings that have no large 
code points; my guess is that it has a bug when the string has a mix of both. 
Please file a bug report.


-- 
Greg Parker gpar...@apple.com Runtime Wrangler



___

Cocoa-dev mailing list (Cocoa-dev@lists.apple.com)

Please do not post admin requests or moderator comments to the list.
Contact the moderators at cocoa-dev-admins(at)lists.apple.com

Help/Unsubscribe/Update your Subscription:
https://lists.apple.com/mailman/options/cocoa-dev/archive%40mail-archive.com

This email sent to arch...@mail-archive.com

Re: Where is my bicycle?

2015-04-06 Thread pscott

On 4/6/2015 4:03 PM, Greg Parker wrote:

On Apr 6, 2015, at 2:20 PM, pscott psc...@skycoast.us wrote:

On 4/6/2015 12:29 PM, Greg Parker wrote:
I'm not an expert here, but my understanding is that when Cocoa says character it usually means 
UTF-16 code unit. @.length == 2, for example. Cocoa's string API designed when 
Unicode was still a true 16-bit character set.

That would be UCS-2 encoding. If the full Unicode character set of 1,112,064 
characters isn't supported it should not be documented as supporting UTF-16.

No, it's not UCS-2. The API generally works as if it were manipulating an array of UTF-16 
code units. @ displays correctly; it would not if the system were truly 
UCS-2.
Right. But what you were describing *would* be UCS-2. To claim UTF-16 
support, variable length encoding must be handled. You cannot 
legitimately claim UTF-16 support by only handling a fixed-size encoding 
(i.e., a single code unit). So, if UTF-16 support is intended, as 
documented, then there has to be a bug, as I stated in an earlier post; 
and as you did also in a later post (albeit with a different reasoning).





smime.p7s
Description: S/MIME Cryptographic Signature
___

Cocoa-dev mailing list (Cocoa-dev@lists.apple.com)

Please do not post admin requests or moderator comments to the list.
Contact the moderators at cocoa-dev-admins(at)lists.apple.com

Help/Unsubscribe/Update your Subscription:
https://lists.apple.com/mailman/options/cocoa-dev/archive%40mail-archive.com

This email sent to arch...@mail-archive.com

Re: Where is my bicycle?

2015-04-06 Thread pscott

On 4/6/2015 4:29 PM, pscott wrote:

On 4/6/2015 4:03 PM, Greg Parker wrote:

On Apr 6, 2015, at 2:20 PM, pscott psc...@skycoast.us wrote:

On 4/6/2015 12:29 PM, Greg Parker wrote:
I'm not an expert here, but my understanding is that when Cocoa 
says character it usually means UTF-16 code unit. @.length 
== 2, for example. Cocoa's string API designed when Unicode was 
still a true 16-bit character set.
That would be UCS-2 encoding. If the full Unicode character set of 
1,112,064 characters isn't supported it should not be documented as 
supporting UTF-16.
No, it's not UCS-2. The API generally works as if it were 
manipulating an array of UTF-16 code units. @ displays correctly; 
it would not if the system were truly UCS-2.
Right. But what you were describing *would* be UCS-2. To claim UTF-16 
support, variable length encoding must be handled. You cannot 
legitimately claim UTF-16 support by only handling a fixed-size 
encoding (i.e., a single code unit). So, if UTF-16 support is 
intended, as documented, then there has to be a bug, as I stated in an 
earlier post; and as you did also in a later post (albeit with a 
different reasoning). 


Greg, I went back and re-read your post a number of times, and after the 
third reading, I got your meaning. I see now you were describing a 
variable length encoding. My apology for misunderstanding your point.




smime.p7s
Description: S/MIME Cryptographic Signature
___

Cocoa-dev mailing list (Cocoa-dev@lists.apple.com)

Please do not post admin requests or moderator comments to the list.
Contact the moderators at cocoa-dev-admins(at)lists.apple.com

Help/Unsubscribe/Update your Subscription:
https://lists.apple.com/mailman/options/cocoa-dev/archive%40mail-archive.com

This email sent to arch...@mail-archive.com

Re: Where is my bicycle?

2015-04-06 Thread Quincey Morris
On Apr 6, 2015, at 16:29 , pscott psc...@skycoast.us wrote:
 
 But what you were describing *would* be UCS-2. To claim UTF-16 support, 
 variable length encoding must be handled.

It’s pretty much understood — on this list — that NSString is based on UTF-16, 
so we tend to cut the corner that’s bothering you. This is complicated by the 
fact that NSString is a bit weird. Its underlying representation is UTF-16 
strings, but its API is array of UTF-16 code units”. That means you can create 
an invalid UTF-16 string with the NSString API. The fact that we’re not 
supposed to do that is also pretty much understood.

This messiness, along with the use of the ambiguous word “character” or 
“Unicode character” in the documentation, is all for historical reasons.

NSCharacterSet is something else again. We don’t actually know whether:

— it’s implemented as a set of UTF-16 code units, instead of code points

— it handles UTF-16 surrogate pairs properly, in which of its API methods

— it handles UTF-32 code units properly, in which of its API methods

— it has bugs that prevent it from doing what it’s intended to do, whatever 
that is

Greg has basically given us the answers: “not code units”, “possibly”, “it’s 
supposed to”, and “probably”. :)



___

Cocoa-dev mailing list (Cocoa-dev@lists.apple.com)

Please do not post admin requests or moderator comments to the list.
Contact the moderators at cocoa-dev-admins(at)lists.apple.com

Help/Unsubscribe/Update your Subscription:
https://lists.apple.com/mailman/options/cocoa-dev/archive%40mail-archive.com

This email sent to arch...@mail-archive.com

Re: Where is my bicycle?

2015-04-06 Thread Gerriet M. Denkmann

 On 7 Apr 2015, at 05:44, Greg Parker gpar...@apple.com wrote:
 
 
 On Apr 6, 2015, at 11:15 AM, Gerriet M. Denkmann gerr...@mdenkmann.de 
 wrote:
 
 2. characterSetWithCharactersInString seems to take only the lower 16 bits 
 of the code points in the string. Bug.
 Works ok though, if all chars in the string have code points ≥ 0x1 (e.g. 
 턞)
 
 The implementation of +characterSetWithCharactersInString: does attempt to 
 handle arbitrary code points. It tries to optimize strings that have no 
 large code points; my guess is that it has a bug when the string has a mix 
 of both. Please file a bug report.

Done: 20444816

Kind regards,

Gerriet.


___

Cocoa-dev mailing list (Cocoa-dev@lists.apple.com)

Please do not post admin requests or moderator comments to the list.
Contact the moderators at cocoa-dev-admins(at)lists.apple.com

Help/Unsubscribe/Update your Subscription:
https://lists.apple.com/mailman/options/cocoa-dev/archive%40mail-archive.com

This email sent to arch...@mail-archive.com

Re: Where is my bicycle?

2015-04-06 Thread Gerriet M. Denkmann

 On 7 Apr 2015, at 03:42, Quincey Morris quinceymor...@rivergatesoftware.com 
 wrote:
 
 On Apr 6, 2015, at 12:29 , Greg Parker gpar...@apple.com wrote:
 
 my understanding is that when Cocoa says character it usually means 
 UTF-16 code unit. @.length == 2, for example. Cocoa's string API 
 designed when Unicode was still a true 16-bit character set.
 
 I would have said so, too, except that NSCharacterSet has this 
 ‘longCharacterIsMember: (UTF32Char)’ API, which seems inexplicable if the 
 parameter is a UTF-16 code unit, since that’s what ‘characterIsMember: 
 (unichar)’ is apparently for.

Well, it is really quite simple:
NSString (and others) means by character: unsigned short in Utf-16 
representation.
But LongCharacter means: Unicode code-point.

Both definitions were the same in Unicode 1.0 (up to about 1996) when Unicode 
was 16 bits only. Starting with 2.0 it became 21 bits.
They are still the same for code-points below 0x10 000, i.e. Plane 0, or Basic 
Multilingual Plane.

Kind regards,

Gerriet.


___

Cocoa-dev mailing list (Cocoa-dev@lists.apple.com)

Please do not post admin requests or moderator comments to the list.
Contact the moderators at cocoa-dev-admins(at)lists.apple.com

Help/Unsubscribe/Update your Subscription:
https://lists.apple.com/mailman/options/cocoa-dev/archive%40mail-archive.com

This email sent to arch...@mail-archive.com

Re: Where is my bicycle?

2015-04-06 Thread Gerriet M. Denkmann

 On 6 Apr 2015, at 23:36, Charles Srstka cocoa...@charlessoft.com wrote:
 
 
 On Apr 6, 2015, at 11:19 AM, Gerriet M. Denkmann gerr...@mdenkmann.de 
 wrote:
 
 OS X 10.10.2
 
  NSString *string = @abc  xyz;// BICYCLE = U+1F6B2
  NSCharacterSet *charSet = [ NSCharacterSet 
 characterSetWithCharactersInString: string ];
  BOOL pq = [ charSet longCharacterIsMember: 0x1F6B2 ];
  NSLog(@%s CharacterSet from \%@\ contains %s 
 (0x1F6B2),__FUNCTION__, string, pq ?  : no ); 
 
 This prints:
 
 CharacterSet from abc  xyz contains no  (0x1F6B2)
 
 Where is my bicycle gone? What am I doing wrong?
 
 Objective-C doesn’t support Unicode in source files (although Swift does).
 
 Charles
 

If this is so: why did my compiler not tell me about this?

Why does this:
NSString *string = @abc 〄 xyz;// JAPANESE INDUSTRIAL STANDARD 
SYMBOL = U+3004
NSCharacterSet *charSet = [ NSCharacterSet 
characterSetWithCharactersInString: string ];
BOOL pq = [ charSet longCharacterIsMember: 0x3004 ];
NSLog(@%s CharacterSet from \%@\ contains %s〄 
(0x3004),__FUNCTION__,  string, pq ?  : no );


print: 
CharacterSet from abc 〄 xyz contains 〄 (0x3004)


Kind regards,

Gerriet.



___

Cocoa-dev mailing list (Cocoa-dev@lists.apple.com)

Please do not post admin requests or moderator comments to the list.
Contact the moderators at cocoa-dev-admins(at)lists.apple.com

Help/Unsubscribe/Update your Subscription:
https://lists.apple.com/mailman/options/cocoa-dev/archive%40mail-archive.com

This email sent to arch...@mail-archive.com

Re: Where is my bicycle?

2015-04-06 Thread Sean McBride
On Mon, 6 Apr 2015 11:36:38 -0500, Charles Srstka said:

Objective-C doesn’t support Unicode in source files (although Swift does).

Yes it does, and it has for many years too.

Cheers,

-- 

Sean McBride, B. Eng s...@rogue-research.com
Rogue Researchwww.rogue-research.com 
Mac Software Developer  Montréal, Québec, Canada

___

Cocoa-dev mailing list (Cocoa-dev@lists.apple.com)

Please do not post admin requests or moderator comments to the list.
Contact the moderators at cocoa-dev-admins(at)lists.apple.com

Help/Unsubscribe/Update your Subscription:
https://lists.apple.com/mailman/options/cocoa-dev/archive%40mail-archive.com

This email sent to arch...@mail-archive.com

Re: Where is my bicycle?

2015-04-06 Thread Charles Srstka
 
 On Apr 6, 2015, at 11:19 AM, Gerriet M. Denkmann gerr...@mdenkmann.de wrote:
 
 OS X 10.10.2
 
   NSString *string = @abc  xyz;// BICYCLE = U+1F6B2
   NSCharacterSet *charSet = [ NSCharacterSet 
 characterSetWithCharactersInString: string ];
   BOOL pq = [ charSet longCharacterIsMember: 0x1F6B2 ];
   NSLog(@%s CharacterSet from \%@\ contains %s 
 (0x1F6B2),__FUNCTION__, string, pq ?  : no ); 
 
 This prints:
 
 CharacterSet from abc  xyz contains no  (0x1F6B2)
 
 Where is my bicycle gone? What am I doing wrong?

Objective-C doesn’t support Unicode in source files (although Swift does).

Charles


___

Cocoa-dev mailing list (Cocoa-dev@lists.apple.com)

Please do not post admin requests or moderator comments to the list.
Contact the moderators at cocoa-dev-admins(at)lists.apple.com

Help/Unsubscribe/Update your Subscription:
https://lists.apple.com/mailman/options/cocoa-dev/archive%40mail-archive.com

This email sent to arch...@mail-archive.com

Re: Where is my bicycle?

2015-04-06 Thread Steve Mills
On Apr 6, 2015, at 11:45:52, Gerriet M. Denkmann gerr...@mdenkmann.de wrote:
 
 
 NSString *string = @abc  xyz;// BICYCLE = U+1F6B2
 
 
 If this is so: why did my compiler not tell me about this?
 
   NSString *string = @abc 〄 xyz;// JAPANESE INDUSTRIAL STANDARD 
 SYMBOL = U+3004

Perhaps because U=3004 is a 2-byte value and U+1f6b2 is not? Have you tried 
changing the encoding of your source file to something else? You'll probably 
have to store such strings in .strings or hardcode the hex values and build the 
strings from that.

--
Steve Mills
Drummer, Mac geek


___

Cocoa-dev mailing list (Cocoa-dev@lists.apple.com)

Please do not post admin requests or moderator comments to the list.
Contact the moderators at cocoa-dev-admins(at)lists.apple.com

Help/Unsubscribe/Update your Subscription:
https://lists.apple.com/mailman/options/cocoa-dev/archive%40mail-archive.com

This email sent to arch...@mail-archive.com

Re: Where is my bicycle?

2015-04-06 Thread Charles Srstka
On Apr 6, 2015, at 11:49 AM, Sean McBride s...@rogue-research.com wrote:
 
 On Mon, 6 Apr 2015 11:36:38 -0500, Charles Srstka said:
 
 Objective-C doesn’t support Unicode in source files (although Swift does).
 
 Yes it does, and it has for many years too.

Huh, I just checked the documentation, and you’re right, they appear to have 
changed that at some point. It definitely used to say in there that using 
Unicode in a source file was officially verboten, although it usually 
accidentally worked anyway.

The problem, then, is likely the fact that NSCharacterSet considers a 
“character” simply as a UTF-16 code point, rather than a true Unicode character 
as Swift does.

Sorry for the noise.

Charles


___

Cocoa-dev mailing list (Cocoa-dev@lists.apple.com)

Please do not post admin requests or moderator comments to the list.
Contact the moderators at cocoa-dev-admins(at)lists.apple.com

Help/Unsubscribe/Update your Subscription:
https://lists.apple.com/mailman/options/cocoa-dev/archive%40mail-archive.com

This email sent to arch...@mail-archive.com

Re: Where is my bicycle?

2015-04-06 Thread Gerriet M. Denkmann

 On 6 Apr 2015, at 23:52, Steve Mills sjmi...@mac.com wrote:
 
 On Apr 6, 2015, at 11:45:52, Gerriet M. Denkmann gerr...@mdenkmann.de wrote:
 
 
NSString *string = @abc  xyz;// BICYCLE = U+1F6B2
 
 
 If this is so: why did my compiler not tell me about this?
 
  NSString *string = @abc 〄 xyz;// JAPANESE INDUSTRIAL STANDARD 
 SYMBOL = U+3004
 
 Perhaps because U=3004 is a 2-byte value and U+1f6b2 is not? Have you tried 
 changing the encoding of your source file to something else? You'll probably 
 have to store such strings in .strings or hardcode the hex values and build 
 the strings from that.

You are right: 

My string looks like:
 string abc  xyz contains:
char[  0] = 0x00061
char[  1] = 0x00062
char[  2] = 0x00063
char[  3] = 0x00020
char[  4] = 0x1f6b2 ← this is a bicycle
char[  5] = 0x00020
char[  6] = 0x00078
char[  7] = 0x00079
char[  8] = 0x0007a

which seems ok.

But when I print the bits in NSCharacterSet bitmapRepresentation I get:

 bit[  1] = 0x00020 =  
 bit[  2] = 0x00061 = a
 bit[  3] = 0x00062 = b
 bit[  4] = 0x00063 = c
 bit[  5] = 0x00078 = x
 bit[  6] = 0x00079 = y
 bit[  7] = 0x0007a = z
 bit[  8] = 0x0f6b2 =    ← this should be 0x1f6b2, which is a bicycle.

Looks like there is a bug in characterSetWithCharactersInString, or not?

Kind regards,

Gerriet.



___

Cocoa-dev mailing list (Cocoa-dev@lists.apple.com)

Please do not post admin requests or moderator comments to the list.
Contact the moderators at cocoa-dev-admins(at)lists.apple.com

Help/Unsubscribe/Update your Subscription:
https://lists.apple.com/mailman/options/cocoa-dev/archive%40mail-archive.com

This email sent to arch...@mail-archive.com