Re: Where is my bicycle?
On 7 Apr 2015, at 00:15, Quincey Morris quinceymor...@rivergatesoftware.com wrote: On Apr 6, 2015, at 09:19 , Gerriet M. Denkmann gerr...@mdenkmann.de wrote: A suggestion, though: Try building your character set using ‘characterSetWithRange:’ and/or the NSMutableCharacterSet methods that add ranges, instead of using NSStrings. Maybe NSCharacterSet really is UTF-32-based, but not — for code compatibility reasons — when using NSStrings explicitly. This turned out to be an excellent idea - it allowed me to create a replacement for characterSetWithCharactersInString: which actually works: // bug work-around + (NSCharacterSet *)gmdCharacterSetWithCharactersInString: (NSString *)string { if ( string.length == 0 ) // return nil { NSLog(@%s string \%@\ is empty or nil → no CharacterSet.,__FUNCTION__, string); return nil; }; NSData *dat = [ string dataUsingEncoding: NSUTF32StringEncoding ]; const UTF32Char *bytes = dat.bytes; NSUInteger length = dat.length / sizeof(UTF32Char); NSMutableCharacterSet *mus = [ [ NSMutableCharacterSet alloc ] init ]; for( NSUInteger i = 1; i length; i++ )// ignore initial kUnicodeByteOrderMark { UTF32Char codePoint = bytes[i]; [ mus addCharactersInRange: NSMakeRange( codePoint, 1 ) ]; }; return mus; } Thanks very much for your suggestion! Kind regards, Gerriet. ___ Cocoa-dev mailing list (Cocoa-dev@lists.apple.com) Please do not post admin requests or moderator comments to the list. Contact the moderators at cocoa-dev-admins(at)lists.apple.com Help/Unsubscribe/Update your Subscription: https://lists.apple.com/mailman/options/cocoa-dev/archive%40mail-archive.com This email sent to arch...@mail-archive.com
Re: Where is my bicycle?
On Apr 7, 2015, at 02:21 , Gerriet M. Denkmann gerr...@mdenkmann.de wrote: it allowed me to create a replacement for characterSetWithCharactersInString: which actually works The only suggestion I have is to return ‘mus.copy’ instead of ‘mus’. Given that we know NSCharacterSet has some optimized internal representations, it’s possible that NSMutableCharacterSet doesn’t use them, since there’s no point until you’re “finished” mutating. If you’re using a wide range of UTF-32 values, the mutable object might be quite large, and taking a immutable copy might produce a very much smaller object. Or not. ___ Cocoa-dev mailing list (Cocoa-dev@lists.apple.com) Please do not post admin requests or moderator comments to the list. Contact the moderators at cocoa-dev-admins(at)lists.apple.com Help/Unsubscribe/Update your Subscription: https://lists.apple.com/mailman/options/cocoa-dev/archive%40mail-archive.com This email sent to arch...@mail-archive.com
Re: Where is my bicycle?
On Apr 7, 2015, at 2:24 PM, Jens Alfke j...@mooseyard.com wrote: This is the same process that allows you to put Japanese or Cyrillic characters in a string and render them in Helvetica or Papyrus even though those fonts don’t support those character sets. I really want to see a Cyrillic version of Papyrus now. ;-) Charles ___ Cocoa-dev mailing list (Cocoa-dev@lists.apple.com) Please do not post admin requests or moderator comments to the list. Contact the moderators at cocoa-dev-admins(at)lists.apple.com Help/Unsubscribe/Update your Subscription: https://lists.apple.com/mailman/options/cocoa-dev/archive%40mail-archive.com This email sent to arch...@mail-archive.com
Re: Where is my bicycle?
On Apr 7, 2015, at 12:59 PM, Charles Srstka cocoa...@charlessoft.com wrote: I really want to see a Cyrillic version of Papyrus now. ;-) http://ihateyouare.deviantart.com/art/Papyrus-Plain-Cyrillic-165111766 http://ihateyouare.deviantart.com/art/Papyrus-Plain-Cyrillic-165111766 You’re welcome :) —Jens ___ Cocoa-dev mailing list (Cocoa-dev@lists.apple.com) Please do not post admin requests or moderator comments to the list. Contact the moderators at cocoa-dev-admins(at)lists.apple.com Help/Unsubscribe/Update your Subscription: https://lists.apple.com/mailman/options/cocoa-dev/archive%40mail-archive.com This email sent to arch...@mail-archive.com
Re: Where is my bicycle?
On Apr 6, 2015, at 2:09 PM, Jack Brindle jackbrin...@me.com wrote: Have you checked the Font you are using to display the character string to see if it contains the bicycle character? If not, you probably won’t get the character you seek. Fonts have nothing to do with it; they’re an aspect of rendering text, not of working with the text in memory. (If it weren’t this way, you wouldn’t be able to work with NSString at all; everything would have to be based on NSAttributedString to carry around the font info for every character.) The bicycle is a well-defined Unicode character, an emoji. When it comes time to render it, the typesetter will look for a glyph in the current font for that character code. It probably won’t find one, so it will go through a series of fallback fonts looking for a glyph until it finds one in whatever internal font stores the emoji glyphs. Then it uses that font to render it. This is the same process that allows you to put Japanese or Cyrillic characters in a string and render them in Helvetica or Papyrus even though those fonts don’t support those character sets. They’re actually being rendered in whatever system font is the default for those character sets. This is all invisible to you unless you start diving down into the NSTypesetter or CoreText APIs. —Jens ___ Cocoa-dev mailing list (Cocoa-dev@lists.apple.com) Please do not post admin requests or moderator comments to the list. Contact the moderators at cocoa-dev-admins(at)lists.apple.com Help/Unsubscribe/Update your Subscription: https://lists.apple.com/mailman/options/cocoa-dev/archive%40mail-archive.com This email sent to arch...@mail-archive.com
Re: Where is my bicycle?
On Apr 6, 2015, at 09:19 , Gerriet M. Denkmann gerr...@mdenkmann.de wrote: Where is my bicycle gone? What am I doing wrong? Before this thread heads further into outer space… I suspect it [NSCharacterSet] is just broken. Look here, for example: http://stackoverflow.com/questions/23000812/creating-nscharacterset-with-unicode-smp-entries-testing-membership-is-this http://stackoverflow.com/questions/23000812/creating-nscharacterset-with-unicode-smp-entries-testing-membership-is-this The problem is that it’s unclear whether the “characters” in NSCharacterSet are internally UTF-16 code units, UTF-32 code units, Unicode code points, or something else. According to the NSCharacterSet documentation: An NSCharacterSet object represents a set of Unicode-compliant characters.” and: The NSCharacterSet class declares the programmatic interface for an object that manages a set of Unicode characters (see the NSString class cluster specification for information on Unicode).” According the NSString documentation: A string object presents itself as an array of Unicode characters (Unicode is a registered trademark of Unicode, Inc.). You can determine how many characters a string object contains with the length method and can retrieve a specific character with the characterAtIndex: method.” Working backwards, we know that the characters that are counted by -[NSString length]’ are UTF-16 code units, so this all *possibly* implies that NSCharacterSet characters are UTF-16 code units, too. Plus, back in NSCharacterSet documentation: NSCharacterSet’s principal primitive method, characterIsMember:, provides the basis for all other instance methods in its interface.” If that’s true, ‘longCharacterIsMember:’ is pretty much screwed. Perhaps the NSCharacterSet documentation is just wrong. Or perhaps, when the API was enhanced in 10.2 (see: http://www.cocoabuilder.com/archive/cocoa/73297-working-with-32-bit-unicode-nsstring-stringwithutf32string-const-utf32char-bytes-needed.html http://www.cocoabuilder.com/archive/cocoa/73297-working-with-32-bit-unicode-nsstring-stringwithutf32string-const-utf32char-bytes-needed.html, for some tantalizing hints about NSCharacterSet), the implementation was a hack that works somehow but isn’t documented. I don’t think you’re going to get any definitive answer except directly from Apple. A suggestion, though: Try building your character set using ‘characterSetWithRange:’ and/or the NSMutableCharacterSet methods that add ranges, instead of using NSStrings. Maybe NSCharacterSet really is UTF-32-based, but not — for code compatibility reasons — when using NSStrings explicitly. ___ Cocoa-dev mailing list (Cocoa-dev@lists.apple.com) Please do not post admin requests or moderator comments to the list. Contact the moderators at cocoa-dev-admins(at)lists.apple.com Help/Unsubscribe/Update your Subscription: https://lists.apple.com/mailman/options/cocoa-dev/archive%40mail-archive.com This email sent to arch...@mail-archive.com
Re: Where is my bicycle?
On Apr 6, 2015, at 9:57 AM, Charles Srstka cocoa...@charlessoft.com wrote: The problem, then, is likely the fact that NSCharacterSet considers a “character” simply as a UTF-16 code point, rather than a true Unicode character as Swift does. That should not matter. UTF-16 is a variable length encoding. It is guaranteed to support all 1,112,064 possible Unicode characters. In order to do that it MUST be variable length, either 2-octets or 4-octets. This appears to be a bug in the Objective-C handling of UTF-16. smime.p7s Description: S/MIME cryptographic signature ___ Cocoa-dev mailing list (Cocoa-dev@lists.apple.com) Please do not post admin requests or moderator comments to the list. Contact the moderators at cocoa-dev-admins(at)lists.apple.com Help/Unsubscribe/Update your Subscription: https://lists.apple.com/mailman/options/cocoa-dev/archive%40mail-archive.com This email sent to arch...@mail-archive.com
Re: Where is my bicycle?
On 7 Apr 2015, at 00:15, Quincey Morris quinceymor...@rivergatesoftware.com wrote: On Apr 6, 2015, at 09:19 , Gerriet M. Denkmann gerr...@mdenkmann.de wrote: Where is my bicycle gone? What am I doing wrong? Before this thread heads further into outer space… I suspect it [NSCharacterSet] is just broken. Look here, for example: http://stackoverflow.com/questions/23000812/creating-nscharacterset-with-unicode-smp-entries-testing-membership-is-this The problem is that it’s unclear whether the “characters” in NSCharacterSet are internally UTF-16 code units, UTF-32 code units, Unicode code points, or something else. According to the NSCharacterSet documentation: An NSCharacterSet object represents a set of Unicode-compliant characters.” and: The NSCharacterSet class declares the programmatic interface for an object that manages a set of Unicode characters (see the NSString class cluster specification for information on Unicode).” According the NSString documentation: A string object presents itself as an array of Unicode characters (Unicode is a registered trademark of Unicode, Inc.). You can determine how many characters a string object contains with the length method and can retrieve a specific character with the characterAtIndex: method.” Working backwards, we know that the characters that are counted by -[NSString length]’ are UTF-16 code units, so this all *possibly* implies that NSCharacterSet characters are UTF-16 code units, too. Plus, back in NSCharacterSet documentation: NSCharacterSet’s principal primitive method, characterIsMember:, provides the basis for all other instance methods in its interface.” If that’s true, ‘longCharacterIsMember:’ is pretty much screwed. Perhaps the NSCharacterSet documentation is just wrong. Or perhaps, when the API was enhanced in 10.2 (see: http://www.cocoabuilder.com/archive/cocoa/73297-working-with-32-bit-unicode-nsstring-stringwithutf32string-const-utf32char-bytes-needed.html, for some tantalizing hints about NSCharacterSet), the implementation was a hack that works somehow but isn’t documented. I don’t think you’re going to get any definitive answer except directly from Apple. A suggestion, though: Try building your character set using ‘characterSetWithRange:’ and/or the NSMutableCharacterSet methods that add ranges, instead of using NSStrings. Maybe NSCharacterSet really is UTF-32-based, but not — for code compatibility reasons — when using NSStrings explicitly. 1. longCharacterIsMember seems to be ok: NSCharacterSet *alphanumericCharacterSet = [ NSCharacterSet alphanumericCharacterSet ]; BOOL pp = [ alphanumericCharacterSet longCharacterIsMember: 0x2f800 ]; returns YES as it should. 2. characterSetWithCharactersInString seems to take only the lower 16 bits of the code points in the string. Bug. Works ok though, if all chars in the string have code points ≥ 0x1 (e.g. 턞) 3. the documentation about bitmapRepresentation is wrong. It says: A raw bitmap representation of a character set is a byte array of 2^16 bits (that is, 8192 bytes). But alphanumericCharacterSet has a bitmap with 32771 = 0x8003 bytes, which mostly look ok. It has some strange things though at the end: 0x2fa1e → 0x2fa2d 0x30011 → 0x30207 which I do not recognise as alphanumeric. 4. characterSetWithRange works a bit better: NSCharacterSet *a = [ NSCharacterSet characterSetWithRange: NSMakeRange(0x1F6B2,1) ]; BOOL pp = [ a longCharacterIsMember: 0x1F6B2 ]; → returns YES as it should. But when I look at the bitmapRepresentation I see 16385 bytes with two bits set: 0x1 and 0x1f6ba (8 bits off) Looks like the format of the bitmapRepresentation is slightly more complex than documented. Kind regards, Gerriet. ___ Cocoa-dev mailing list (Cocoa-dev@lists.apple.com) Please do not post admin requests or moderator comments to the list. Contact the moderators at cocoa-dev-admins(at)lists.apple.com Help/Unsubscribe/Update your Subscription: https://lists.apple.com/mailman/options/cocoa-dev/archive%40mail-archive.com This email sent to arch...@mail-archive.com
Re: Where is my bicycle?
On Apr 6, 2015, at 10:15 AM, Quincey Morris quinceymor...@rivergatesoftware.com wrote: On Apr 6, 2015, at 09:19 , Gerriet M. Denkmann gerr...@mdenkmann.de wrote: Where is my bicycle gone? What am I doing wrong? The problem is that it’s unclear whether the “characters” in NSCharacterSet are internally UTF-16 code units, UTF-32 code units, Unicode code points, or something else. According to the NSCharacterSet documentation: I'm not an expert here, but my understanding is that when Cocoa says character it usually means UTF-16 code unit. @.length == 2, for example. Cocoa's string API designed when Unicode was still a true 16-bit character set. -- Greg Parker gpar...@apple.com Runtime Wrangler ___ Cocoa-dev mailing list (Cocoa-dev@lists.apple.com) Please do not post admin requests or moderator comments to the list. Contact the moderators at cocoa-dev-admins(at)lists.apple.com Help/Unsubscribe/Update your Subscription: https://lists.apple.com/mailman/options/cocoa-dev/archive%40mail-archive.com This email sent to arch...@mail-archive.com
Re: Where is my bicycle?
On Apr 6, 2015, at 12:29 , Greg Parker gpar...@apple.com wrote: my understanding is that when Cocoa says character it usually means UTF-16 code unit. @.length == 2, for example. Cocoa's string API designed when Unicode was still a true 16-bit character set. I would have said so, too, except that NSCharacterSet has this ‘longCharacterIsMember: (UTF32Char)’ API, which seems inexplicable if the parameter is a UTF-16 code unit, since that’s what ‘characterIsMember: (unichar)’ is apparently for. ___ Cocoa-dev mailing list (Cocoa-dev@lists.apple.com) Please do not post admin requests or moderator comments to the list. Contact the moderators at cocoa-dev-admins(at)lists.apple.com Help/Unsubscribe/Update your Subscription: https://lists.apple.com/mailman/options/cocoa-dev/archive%40mail-archive.com This email sent to arch...@mail-archive.com
Re: Where is my bicycle?
Have you checked the Font you are using to display the character string to see if it contains the bicycle character? If not, you probably won’t get the character you seek. - Jack On Apr 6, 2015, at 11:15 AM, Gerriet M. Denkmann gerr...@mdenkmann.de wrote: On 7 Apr 2015, at 00:15, Quincey Morris quinceymor...@rivergatesoftware.com wrote: On Apr 6, 2015, at 09:19 , Gerriet M. Denkmann gerr...@mdenkmann.de wrote: Where is my bicycle gone? What am I doing wrong? Before this thread heads further into outer space… I suspect it [NSCharacterSet] is just broken. Look here, for example: http://stackoverflow.com/questions/23000812/creating-nscharacterset-with-unicode-smp-entries-testing-membership-is-this The problem is that it’s unclear whether the “characters” in NSCharacterSet are internally UTF-16 code units, UTF-32 code units, Unicode code points, or something else. According to the NSCharacterSet documentation: An NSCharacterSet object represents a set of Unicode-compliant characters.” and: The NSCharacterSet class declares the programmatic interface for an object that manages a set of Unicode characters (see the NSString class cluster specification for information on Unicode).” According the NSString documentation: A string object presents itself as an array of Unicode characters (Unicode is a registered trademark of Unicode, Inc.). You can determine how many characters a string object contains with the length method and can retrieve a specific character with the characterAtIndex: method.” Working backwards, we know that the characters that are counted by -[NSString length]’ are UTF-16 code units, so this all *possibly* implies that NSCharacterSet characters are UTF-16 code units, too. Plus, back in NSCharacterSet documentation: NSCharacterSet’s principal primitive method, characterIsMember:, provides the basis for all other instance methods in its interface.” If that’s true, ‘longCharacterIsMember:’ is pretty much screwed. Perhaps the NSCharacterSet documentation is just wrong. Or perhaps, when the API was enhanced in 10.2 (see: http://www.cocoabuilder.com/archive/cocoa/73297-working-with-32-bit-unicode-nsstring-stringwithutf32string-const-utf32char-bytes-needed.html, for some tantalizing hints about NSCharacterSet), the implementation was a hack that works somehow but isn’t documented. I don’t think you’re going to get any definitive answer except directly from Apple. A suggestion, though: Try building your character set using ‘characterSetWithRange:’ and/or the NSMutableCharacterSet methods that add ranges, instead of using NSStrings. Maybe NSCharacterSet really is UTF-32-based, but not — for code compatibility reasons — when using NSStrings explicitly. 1. longCharacterIsMember seems to be ok: NSCharacterSet *alphanumericCharacterSet = [ NSCharacterSet alphanumericCharacterSet ]; BOOL pp = [ alphanumericCharacterSet longCharacterIsMember: 0x2f800 ]; returns YES as it should. 2. characterSetWithCharactersInString seems to take only the lower 16 bits of the code points in the string. Bug. Works ok though, if all chars in the string have code points ≥ 0x1 (e.g. 턞) 3. the documentation about bitmapRepresentation is wrong. It says: A raw bitmap representation of a character set is a byte array of 2^16 bits (that is, 8192 bytes). But alphanumericCharacterSet has a bitmap with 32771 = 0x8003 bytes, which mostly look ok. It has some strange things though at the end: 0x2fa1e → 0x2fa2d 0x30011 → 0x30207 which I do not recognise as alphanumeric. 4. characterSetWithRange works a bit better: NSCharacterSet *a = [ NSCharacterSet characterSetWithRange: NSMakeRange(0x1F6B2,1) ]; BOOL pp = [ a longCharacterIsMember: 0x1F6B2 ]; → returns YES as it should. But when I look at the bitmapRepresentation I see 16385 bytes with two bits set: 0x1 and 0x1f6ba (8 bits off) Looks like the format of the bitmapRepresentation is slightly more complex than documented. Kind regards, Gerriet. ___ Cocoa-dev mailing list (Cocoa-dev@lists.apple.com) Please do not post admin requests or moderator comments to the list. Contact the moderators at cocoa-dev-admins(at)lists.apple.com Help/Unsubscribe/Update your Subscription: https://lists.apple.com/mailman/options/cocoa-dev/jackbrindle%40me.com This email sent to jackbrin...@me.com ___ Cocoa-dev mailing list (Cocoa-dev@lists.apple.com) Please do not post admin requests or moderator comments to the list. Contact the moderators at cocoa-dev-admins(at)lists.apple.com Help/Unsubscribe/Update your Subscription: https://lists.apple.com/mailman/options/cocoa-dev/archive%40mail-archive.com This email sent to arch...@mail-archive.com
Re: Where is my bicycle?
If you're unable to do what you need with Cocoa, maybe it would work to use ICU. Michael David Crawford, Consulting Software Engineer mdcrawf...@gmail.com http://www.warplife.com/mdc/ Available for Software Development in the Portland, Oregon Metropolitan Area. On Mon, Apr 6, 2015 at 4:57 PM, Quincey Morris quinceymor...@rivergatesoftware.com wrote: On Apr 6, 2015, at 16:29 , pscott psc...@skycoast.us wrote: But what you were describing *would* be UCS-2. To claim UTF-16 support, variable length encoding must be handled. It's pretty much understood -- on this list -- that NSString is based on UTF-16, so we tend to cut the corner that's bothering you. This is complicated by the fact that NSString is a bit weird. Its underlying representation is UTF-16 strings, but its API is array of UTF-16 code units. That means you can create an invalid UTF-16 string with the NSString API. The fact that we're not supposed to do that is also pretty much understood. This messiness, along with the use of the ambiguous word character or Unicode character in the documentation, is all for historical reasons. NSCharacterSet is something else again. We don't actually know whether: -- it's implemented as a set of UTF-16 code units, instead of code points -- it handles UTF-16 surrogate pairs properly, in which of its API methods -- it handles UTF-32 code units properly, in which of its API methods -- it has bugs that prevent it from doing what it's intended to do, whatever that is Greg has basically given us the answers: not code units, possibly, it's supposed to, and probably. :) ___ Cocoa-dev mailing list (Cocoa-dev@lists.apple.com) Please do not post admin requests or moderator comments to the list. Contact the moderators at cocoa-dev-admins(at)lists.apple.com Help/Unsubscribe/Update your Subscription: https://lists.apple.com/mailman/options/cocoa-dev/mdcrawford%40gmail.com This email sent to mdcrawf...@gmail.com ___ Cocoa-dev mailing list (Cocoa-dev@lists.apple.com) Please do not post admin requests or moderator comments to the list. Contact the moderators at cocoa-dev-admins(at)lists.apple.com Help/Unsubscribe/Update your Subscription: https://lists.apple.com/mailman/options/cocoa-dev/archive%40mail-archive.com This email sent to arch...@mail-archive.com
Re: Where is my bicycle?
Your bicycle showed up in my GMail in Firefox on Yosemite, but not in Safari on my Mom's iMac running Tiger. Michael David Crawford, Consulting Software Engineer mdcrawf...@gmail.com http://www.warplife.com/mdc/ Available for Software Development in the Portland, Oregon Metropolitan Area. On Mon, Apr 6, 2015 at 2:09 PM, Jack Brindle jackbrin...@me.com wrote: Have you checked the Font you are using to display the character string to see if it contains the bicycle character? If not, you probably won’t get the character you seek. - Jack On Apr 6, 2015, at 11:15 AM, Gerriet M. Denkmann gerr...@mdenkmann.de wrote: On 7 Apr 2015, at 00:15, Quincey Morris quinceymor...@rivergatesoftware.com wrote: On Apr 6, 2015, at 09:19 , Gerriet M. Denkmann gerr...@mdenkmann.de wrote: Where is my bicycle gone? What am I doing wrong? Before this thread heads further into outer space… I suspect it [NSCharacterSet] is just broken. Look here, for example: http://stackoverflow.com/questions/23000812/creating-nscharacterset-with-unicode-smp-entries-testing-membership-is-this The problem is that it’s unclear whether the “characters” in NSCharacterSet are internally UTF-16 code units, UTF-32 code units, Unicode code points, or something else. According to the NSCharacterSet documentation: An NSCharacterSet object represents a set of Unicode-compliant characters.” and: The NSCharacterSet class declares the programmatic interface for an object that manages a set of Unicode characters (see the NSString class cluster specification for information on Unicode).” According the NSString documentation: A string object presents itself as an array of Unicode characters (Unicode is a registered trademark of Unicode, Inc.). You can determine how many characters a string object contains with the length method and can retrieve a specific character with the characterAtIndex: method.” Working backwards, we know that the characters that are counted by -[NSString length]’ are UTF-16 code units, so this all *possibly* implies that NSCharacterSet characters are UTF-16 code units, too. Plus, back in NSCharacterSet documentation: NSCharacterSet’s principal primitive method, characterIsMember:, provides the basis for all other instance methods in its interface.” If that’s true, ‘longCharacterIsMember:’ is pretty much screwed. Perhaps the NSCharacterSet documentation is just wrong. Or perhaps, when the API was enhanced in 10.2 (see: http://www.cocoabuilder.com/archive/cocoa/73297-working-with-32-bit-unicode-nsstring-stringwithutf32string-const-utf32char-bytes-needed.html, for some tantalizing hints about NSCharacterSet), the implementation was a hack that works somehow but isn’t documented. I don’t think you’re going to get any definitive answer except directly from Apple. A suggestion, though: Try building your character set using ‘characterSetWithRange:’ and/or the NSMutableCharacterSet methods that add ranges, instead of using NSStrings. Maybe NSCharacterSet really is UTF-32-based, but not — for code compatibility reasons — when using NSStrings explicitly. 1. longCharacterIsMember seems to be ok: NSCharacterSet *alphanumericCharacterSet = [ NSCharacterSet alphanumericCharacterSet ]; BOOL pp = [ alphanumericCharacterSet longCharacterIsMember: 0x2f800 ]; returns YES as it should. 2. characterSetWithCharactersInString seems to take only the lower 16 bits of the code points in the string. Bug. Works ok though, if all chars in the string have code points ≥ 0x1 (e.g. 턞) 3. the documentation about bitmapRepresentation is wrong. It says: A raw bitmap representation of a character set is a byte array of 2^16 bits (that is, 8192 bytes). But alphanumericCharacterSet has a bitmap with 32771 = 0x8003 bytes, which mostly look ok. It has some strange things though at the end: 0x2fa1e → 0x2fa2d 0x30011 → 0x30207 which I do not recognise as alphanumeric. 4. characterSetWithRange works a bit better: NSCharacterSet *a = [ NSCharacterSet characterSetWithRange: NSMakeRange(0x1F6B2,1) ]; BOOL pp = [ a longCharacterIsMember: 0x1F6B2 ]; → returns YES as it should. But when I look at the bitmapRepresentation I see 16385 bytes with two bits set: 0x1 and 0x1f6ba (8 bits off) Looks like the format of the bitmapRepresentation is slightly more complex than documented. Kind regards, Gerriet. ___ Cocoa-dev mailing list (Cocoa-dev@lists.apple.com) Please do not post admin requests or moderator comments to the list. Contact the moderators at cocoa-dev-admins(at)lists.apple.com Help/Unsubscribe/Update your Subscription: https://lists.apple.com/mailman/options/cocoa-dev/jackbrindle%40me.com This email sent to jackbrin...@me.com ___ Cocoa-dev mailing list (Cocoa-dev@lists.apple.com)
Re: Where is my bicycle?
On Apr 6, 2015, at 2:20 PM, pscott psc...@skycoast.us wrote: On 4/6/2015 12:29 PM, Greg Parker wrote: I'm not an expert here, but my understanding is that when Cocoa says character it usually means UTF-16 code unit. @.length == 2, for example. Cocoa's string API designed when Unicode was still a true 16-bit character set. That would be UCS-2 encoding. If the full Unicode character set of 1,112,064 characters isn't supported it should not be documented as supporting UTF-16. No, it's not UCS-2. The API generally works as if it were manipulating an array of UTF-16 code units. @ displays correctly; it would not if the system were truly UCS-2. -- Greg Parker gpar...@apple.com Runtime Wrangler ___ Cocoa-dev mailing list (Cocoa-dev@lists.apple.com) Please do not post admin requests or moderator comments to the list. Contact the moderators at cocoa-dev-admins(at)lists.apple.com Help/Unsubscribe/Update your Subscription: https://lists.apple.com/mailman/options/cocoa-dev/archive%40mail-archive.com This email sent to arch...@mail-archive.com
Re: Where is my bicycle?
On 4/6/2015 12:29 PM, Greg Parker wrote: I'm not an expert here, but my understanding is that when Cocoa says character it usually means UTF-16 code unit. @.length == 2, for example. Cocoa's string API designed when Unicode was still a true 16-bit character set. That would be UCS-2 encoding. If the full Unicode character set of 1,112,064 characters isn't supported it should not be documented as supporting UTF-16. Paul smime.p7s Description: S/MIME Cryptographic Signature ___ Cocoa-dev mailing list (Cocoa-dev@lists.apple.com) Please do not post admin requests or moderator comments to the list. Contact the moderators at cocoa-dev-admins(at)lists.apple.com Help/Unsubscribe/Update your Subscription: https://lists.apple.com/mailman/options/cocoa-dev/archive%40mail-archive.com This email sent to arch...@mail-archive.com
Re: Where is my bicycle?
On Apr 6, 2015, at 11:15 AM, Gerriet M. Denkmann gerr...@mdenkmann.de wrote: 2. characterSetWithCharactersInString seems to take only the lower 16 bits of the code points in the string. Bug. Works ok though, if all chars in the string have code points ≥ 0x1 (e.g. 턞) The implementation of +characterSetWithCharactersInString: does attempt to handle arbitrary code points. It tries to optimize strings that have no large code points; my guess is that it has a bug when the string has a mix of both. Please file a bug report. -- Greg Parker gpar...@apple.com Runtime Wrangler ___ Cocoa-dev mailing list (Cocoa-dev@lists.apple.com) Please do not post admin requests or moderator comments to the list. Contact the moderators at cocoa-dev-admins(at)lists.apple.com Help/Unsubscribe/Update your Subscription: https://lists.apple.com/mailman/options/cocoa-dev/archive%40mail-archive.com This email sent to arch...@mail-archive.com
Re: Where is my bicycle?
On 4/6/2015 4:03 PM, Greg Parker wrote: On Apr 6, 2015, at 2:20 PM, pscott psc...@skycoast.us wrote: On 4/6/2015 12:29 PM, Greg Parker wrote: I'm not an expert here, but my understanding is that when Cocoa says character it usually means UTF-16 code unit. @.length == 2, for example. Cocoa's string API designed when Unicode was still a true 16-bit character set. That would be UCS-2 encoding. If the full Unicode character set of 1,112,064 characters isn't supported it should not be documented as supporting UTF-16. No, it's not UCS-2. The API generally works as if it were manipulating an array of UTF-16 code units. @ displays correctly; it would not if the system were truly UCS-2. Right. But what you were describing *would* be UCS-2. To claim UTF-16 support, variable length encoding must be handled. You cannot legitimately claim UTF-16 support by only handling a fixed-size encoding (i.e., a single code unit). So, if UTF-16 support is intended, as documented, then there has to be a bug, as I stated in an earlier post; and as you did also in a later post (albeit with a different reasoning). smime.p7s Description: S/MIME Cryptographic Signature ___ Cocoa-dev mailing list (Cocoa-dev@lists.apple.com) Please do not post admin requests or moderator comments to the list. Contact the moderators at cocoa-dev-admins(at)lists.apple.com Help/Unsubscribe/Update your Subscription: https://lists.apple.com/mailman/options/cocoa-dev/archive%40mail-archive.com This email sent to arch...@mail-archive.com
Re: Where is my bicycle?
On 4/6/2015 4:29 PM, pscott wrote: On 4/6/2015 4:03 PM, Greg Parker wrote: On Apr 6, 2015, at 2:20 PM, pscott psc...@skycoast.us wrote: On 4/6/2015 12:29 PM, Greg Parker wrote: I'm not an expert here, but my understanding is that when Cocoa says character it usually means UTF-16 code unit. @.length == 2, for example. Cocoa's string API designed when Unicode was still a true 16-bit character set. That would be UCS-2 encoding. If the full Unicode character set of 1,112,064 characters isn't supported it should not be documented as supporting UTF-16. No, it's not UCS-2. The API generally works as if it were manipulating an array of UTF-16 code units. @ displays correctly; it would not if the system were truly UCS-2. Right. But what you were describing *would* be UCS-2. To claim UTF-16 support, variable length encoding must be handled. You cannot legitimately claim UTF-16 support by only handling a fixed-size encoding (i.e., a single code unit). So, if UTF-16 support is intended, as documented, then there has to be a bug, as I stated in an earlier post; and as you did also in a later post (albeit with a different reasoning). Greg, I went back and re-read your post a number of times, and after the third reading, I got your meaning. I see now you were describing a variable length encoding. My apology for misunderstanding your point. smime.p7s Description: S/MIME Cryptographic Signature ___ Cocoa-dev mailing list (Cocoa-dev@lists.apple.com) Please do not post admin requests or moderator comments to the list. Contact the moderators at cocoa-dev-admins(at)lists.apple.com Help/Unsubscribe/Update your Subscription: https://lists.apple.com/mailman/options/cocoa-dev/archive%40mail-archive.com This email sent to arch...@mail-archive.com
Re: Where is my bicycle?
On Apr 6, 2015, at 16:29 , pscott psc...@skycoast.us wrote: But what you were describing *would* be UCS-2. To claim UTF-16 support, variable length encoding must be handled. It’s pretty much understood — on this list — that NSString is based on UTF-16, so we tend to cut the corner that’s bothering you. This is complicated by the fact that NSString is a bit weird. Its underlying representation is UTF-16 strings, but its API is array of UTF-16 code units”. That means you can create an invalid UTF-16 string with the NSString API. The fact that we’re not supposed to do that is also pretty much understood. This messiness, along with the use of the ambiguous word “character” or “Unicode character” in the documentation, is all for historical reasons. NSCharacterSet is something else again. We don’t actually know whether: — it’s implemented as a set of UTF-16 code units, instead of code points — it handles UTF-16 surrogate pairs properly, in which of its API methods — it handles UTF-32 code units properly, in which of its API methods — it has bugs that prevent it from doing what it’s intended to do, whatever that is Greg has basically given us the answers: “not code units”, “possibly”, “it’s supposed to”, and “probably”. :) ___ Cocoa-dev mailing list (Cocoa-dev@lists.apple.com) Please do not post admin requests or moderator comments to the list. Contact the moderators at cocoa-dev-admins(at)lists.apple.com Help/Unsubscribe/Update your Subscription: https://lists.apple.com/mailman/options/cocoa-dev/archive%40mail-archive.com This email sent to arch...@mail-archive.com
Re: Where is my bicycle?
On 7 Apr 2015, at 05:44, Greg Parker gpar...@apple.com wrote: On Apr 6, 2015, at 11:15 AM, Gerriet M. Denkmann gerr...@mdenkmann.de wrote: 2. characterSetWithCharactersInString seems to take only the lower 16 bits of the code points in the string. Bug. Works ok though, if all chars in the string have code points ≥ 0x1 (e.g. 턞) The implementation of +characterSetWithCharactersInString: does attempt to handle arbitrary code points. It tries to optimize strings that have no large code points; my guess is that it has a bug when the string has a mix of both. Please file a bug report. Done: 20444816 Kind regards, Gerriet. ___ Cocoa-dev mailing list (Cocoa-dev@lists.apple.com) Please do not post admin requests or moderator comments to the list. Contact the moderators at cocoa-dev-admins(at)lists.apple.com Help/Unsubscribe/Update your Subscription: https://lists.apple.com/mailman/options/cocoa-dev/archive%40mail-archive.com This email sent to arch...@mail-archive.com
Re: Where is my bicycle?
On 7 Apr 2015, at 03:42, Quincey Morris quinceymor...@rivergatesoftware.com wrote: On Apr 6, 2015, at 12:29 , Greg Parker gpar...@apple.com wrote: my understanding is that when Cocoa says character it usually means UTF-16 code unit. @.length == 2, for example. Cocoa's string API designed when Unicode was still a true 16-bit character set. I would have said so, too, except that NSCharacterSet has this ‘longCharacterIsMember: (UTF32Char)’ API, which seems inexplicable if the parameter is a UTF-16 code unit, since that’s what ‘characterIsMember: (unichar)’ is apparently for. Well, it is really quite simple: NSString (and others) means by character: unsigned short in Utf-16 representation. But LongCharacter means: Unicode code-point. Both definitions were the same in Unicode 1.0 (up to about 1996) when Unicode was 16 bits only. Starting with 2.0 it became 21 bits. They are still the same for code-points below 0x10 000, i.e. Plane 0, or Basic Multilingual Plane. Kind regards, Gerriet. ___ Cocoa-dev mailing list (Cocoa-dev@lists.apple.com) Please do not post admin requests or moderator comments to the list. Contact the moderators at cocoa-dev-admins(at)lists.apple.com Help/Unsubscribe/Update your Subscription: https://lists.apple.com/mailman/options/cocoa-dev/archive%40mail-archive.com This email sent to arch...@mail-archive.com
Re: Where is my bicycle?
On 6 Apr 2015, at 23:36, Charles Srstka cocoa...@charlessoft.com wrote: On Apr 6, 2015, at 11:19 AM, Gerriet M. Denkmann gerr...@mdenkmann.de wrote: OS X 10.10.2 NSString *string = @abc xyz;// BICYCLE = U+1F6B2 NSCharacterSet *charSet = [ NSCharacterSet characterSetWithCharactersInString: string ]; BOOL pq = [ charSet longCharacterIsMember: 0x1F6B2 ]; NSLog(@%s CharacterSet from \%@\ contains %s (0x1F6B2),__FUNCTION__, string, pq ? : no ); This prints: CharacterSet from abc xyz contains no (0x1F6B2) Where is my bicycle gone? What am I doing wrong? Objective-C doesn’t support Unicode in source files (although Swift does). Charles If this is so: why did my compiler not tell me about this? Why does this: NSString *string = @abc 〄 xyz;// JAPANESE INDUSTRIAL STANDARD SYMBOL = U+3004 NSCharacterSet *charSet = [ NSCharacterSet characterSetWithCharactersInString: string ]; BOOL pq = [ charSet longCharacterIsMember: 0x3004 ]; NSLog(@%s CharacterSet from \%@\ contains %s〄 (0x3004),__FUNCTION__, string, pq ? : no ); print: CharacterSet from abc 〄 xyz contains 〄 (0x3004) Kind regards, Gerriet. ___ Cocoa-dev mailing list (Cocoa-dev@lists.apple.com) Please do not post admin requests or moderator comments to the list. Contact the moderators at cocoa-dev-admins(at)lists.apple.com Help/Unsubscribe/Update your Subscription: https://lists.apple.com/mailman/options/cocoa-dev/archive%40mail-archive.com This email sent to arch...@mail-archive.com
Re: Where is my bicycle?
On Mon, 6 Apr 2015 11:36:38 -0500, Charles Srstka said: Objective-C doesn’t support Unicode in source files (although Swift does). Yes it does, and it has for many years too. Cheers, -- Sean McBride, B. Eng s...@rogue-research.com Rogue Researchwww.rogue-research.com Mac Software Developer Montréal, Québec, Canada ___ Cocoa-dev mailing list (Cocoa-dev@lists.apple.com) Please do not post admin requests or moderator comments to the list. Contact the moderators at cocoa-dev-admins(at)lists.apple.com Help/Unsubscribe/Update your Subscription: https://lists.apple.com/mailman/options/cocoa-dev/archive%40mail-archive.com This email sent to arch...@mail-archive.com
Re: Where is my bicycle?
On Apr 6, 2015, at 11:19 AM, Gerriet M. Denkmann gerr...@mdenkmann.de wrote: OS X 10.10.2 NSString *string = @abc xyz;// BICYCLE = U+1F6B2 NSCharacterSet *charSet = [ NSCharacterSet characterSetWithCharactersInString: string ]; BOOL pq = [ charSet longCharacterIsMember: 0x1F6B2 ]; NSLog(@%s CharacterSet from \%@\ contains %s (0x1F6B2),__FUNCTION__, string, pq ? : no ); This prints: CharacterSet from abc xyz contains no (0x1F6B2) Where is my bicycle gone? What am I doing wrong? Objective-C doesn’t support Unicode in source files (although Swift does). Charles ___ Cocoa-dev mailing list (Cocoa-dev@lists.apple.com) Please do not post admin requests or moderator comments to the list. Contact the moderators at cocoa-dev-admins(at)lists.apple.com Help/Unsubscribe/Update your Subscription: https://lists.apple.com/mailman/options/cocoa-dev/archive%40mail-archive.com This email sent to arch...@mail-archive.com
Re: Where is my bicycle?
On Apr 6, 2015, at 11:45:52, Gerriet M. Denkmann gerr...@mdenkmann.de wrote: NSString *string = @abc xyz;// BICYCLE = U+1F6B2 If this is so: why did my compiler not tell me about this? NSString *string = @abc 〄 xyz;// JAPANESE INDUSTRIAL STANDARD SYMBOL = U+3004 Perhaps because U=3004 is a 2-byte value and U+1f6b2 is not? Have you tried changing the encoding of your source file to something else? You'll probably have to store such strings in .strings or hardcode the hex values and build the strings from that. -- Steve Mills Drummer, Mac geek ___ Cocoa-dev mailing list (Cocoa-dev@lists.apple.com) Please do not post admin requests or moderator comments to the list. Contact the moderators at cocoa-dev-admins(at)lists.apple.com Help/Unsubscribe/Update your Subscription: https://lists.apple.com/mailman/options/cocoa-dev/archive%40mail-archive.com This email sent to arch...@mail-archive.com
Re: Where is my bicycle?
On Apr 6, 2015, at 11:49 AM, Sean McBride s...@rogue-research.com wrote: On Mon, 6 Apr 2015 11:36:38 -0500, Charles Srstka said: Objective-C doesn’t support Unicode in source files (although Swift does). Yes it does, and it has for many years too. Huh, I just checked the documentation, and you’re right, they appear to have changed that at some point. It definitely used to say in there that using Unicode in a source file was officially verboten, although it usually accidentally worked anyway. The problem, then, is likely the fact that NSCharacterSet considers a “character” simply as a UTF-16 code point, rather than a true Unicode character as Swift does. Sorry for the noise. Charles ___ Cocoa-dev mailing list (Cocoa-dev@lists.apple.com) Please do not post admin requests or moderator comments to the list. Contact the moderators at cocoa-dev-admins(at)lists.apple.com Help/Unsubscribe/Update your Subscription: https://lists.apple.com/mailman/options/cocoa-dev/archive%40mail-archive.com This email sent to arch...@mail-archive.com
Re: Where is my bicycle?
On 6 Apr 2015, at 23:52, Steve Mills sjmi...@mac.com wrote: On Apr 6, 2015, at 11:45:52, Gerriet M. Denkmann gerr...@mdenkmann.de wrote: NSString *string = @abc xyz;// BICYCLE = U+1F6B2 If this is so: why did my compiler not tell me about this? NSString *string = @abc 〄 xyz;// JAPANESE INDUSTRIAL STANDARD SYMBOL = U+3004 Perhaps because U=3004 is a 2-byte value and U+1f6b2 is not? Have you tried changing the encoding of your source file to something else? You'll probably have to store such strings in .strings or hardcode the hex values and build the strings from that. You are right: My string looks like: string abc xyz contains: char[ 0] = 0x00061 char[ 1] = 0x00062 char[ 2] = 0x00063 char[ 3] = 0x00020 char[ 4] = 0x1f6b2 ← this is a bicycle char[ 5] = 0x00020 char[ 6] = 0x00078 char[ 7] = 0x00079 char[ 8] = 0x0007a which seems ok. But when I print the bits in NSCharacterSet bitmapRepresentation I get: bit[ 1] = 0x00020 = bit[ 2] = 0x00061 = a bit[ 3] = 0x00062 = b bit[ 4] = 0x00063 = c bit[ 5] = 0x00078 = x bit[ 6] = 0x00079 = y bit[ 7] = 0x0007a = z bit[ 8] = 0x0f6b2 = ← this should be 0x1f6b2, which is a bicycle. Looks like there is a bug in characterSetWithCharactersInString, or not? Kind regards, Gerriet. ___ Cocoa-dev mailing list (Cocoa-dev@lists.apple.com) Please do not post admin requests or moderator comments to the list. Contact the moderators at cocoa-dev-admins(at)lists.apple.com Help/Unsubscribe/Update your Subscription: https://lists.apple.com/mailman/options/cocoa-dev/archive%40mail-archive.com This email sent to arch...@mail-archive.com