Re: Where is my bicycle?

2015-04-07 Thread Jens Alfke

> On Apr 7, 2015, at 12:59 PM, Charles Srstka  wrote:
> 
> I really want to see a Cyrillic version of Papyrus now. ;-)

http://ihateyouare.deviantart.com/art/Papyrus-Plain-Cyrillic-165111766 


You’re welcome :)

—Jens
___

Cocoa-dev mailing list (Cocoa-dev@lists.apple.com)

Please do not post admin requests or moderator comments to the list.
Contact the moderators at cocoa-dev-admins(at)lists.apple.com

Help/Unsubscribe/Update your Subscription:
https://lists.apple.com/mailman/options/cocoa-dev/archive%40mail-archive.com

This email sent to arch...@mail-archive.com

Re: Where is my bicycle?

2015-04-07 Thread Charles Srstka
On Apr 7, 2015, at 2:24 PM, Jens Alfke  wrote:
> 
> This is the same process that allows you to put Japanese or Cyrillic 
> characters in a string and render them in Helvetica or Papyrus even though 
> those fonts don’t support those character sets.

I really want to see a Cyrillic version of Papyrus now. ;-)

Charles

___

Cocoa-dev mailing list (Cocoa-dev@lists.apple.com)

Please do not post admin requests or moderator comments to the list.
Contact the moderators at cocoa-dev-admins(at)lists.apple.com

Help/Unsubscribe/Update your Subscription:
https://lists.apple.com/mailman/options/cocoa-dev/archive%40mail-archive.com

This email sent to arch...@mail-archive.com

Re: Where is my bicycle?

2015-04-07 Thread Jens Alfke

> On Apr 6, 2015, at 2:09 PM, Jack Brindle  wrote:
> 
> Have you checked the Font you are using to display the character string to 
> see if it contains the bicycle character? If not, you probably won’t get the 
> character you seek.

Fonts have nothing to do with it; they’re an aspect of rendering text, not of 
working with the text in memory. (If it weren’t this way, you wouldn’t be able 
to work with NSString at all; everything would have to be based on 
NSAttributedString to carry around the font info for every character.)

The bicycle is a well-defined Unicode character, an emoji. When it comes time 
to render it, the typesetter will look for a glyph in the current font for that 
character code. It probably won’t find one, so it will go through a series of 
fallback fonts looking for a glyph until it finds one in whatever internal font 
stores the emoji glyphs. Then it uses that font to render it.

This is the same process that allows you to put Japanese or Cyrillic characters 
in a string and render them in Helvetica or Papyrus even though those fonts 
don’t support those character sets. They’re actually being rendered in whatever 
system font is the default for those character sets. This is all invisible to 
you unless you start diving down into the NSTypesetter or CoreText APIs.

—Jens
___

Cocoa-dev mailing list (Cocoa-dev@lists.apple.com)

Please do not post admin requests or moderator comments to the list.
Contact the moderators at cocoa-dev-admins(at)lists.apple.com

Help/Unsubscribe/Update your Subscription:
https://lists.apple.com/mailman/options/cocoa-dev/archive%40mail-archive.com

This email sent to arch...@mail-archive.com

Re: Where is my bicycle?

2015-04-07 Thread Quincey Morris
On Apr 7, 2015, at 02:21 , Gerriet M. Denkmann  wrote:
> 
> it allowed me to create a replacement for characterSetWithCharactersInString: 
> which actually works

The only suggestion I have is to return ‘mus.copy’ instead of ‘mus’.

Given that we know NSCharacterSet has some optimized internal representations, 
it’s possible that NSMutableCharacterSet doesn’t use them, since there’s no 
point until you’re “finished” mutating. If you’re using a wide range of UTF-32 
values, the mutable object might be quite large, and taking a immutable copy 
might produce a very much smaller object.

Or not.



___

Cocoa-dev mailing list (Cocoa-dev@lists.apple.com)

Please do not post admin requests or moderator comments to the list.
Contact the moderators at cocoa-dev-admins(at)lists.apple.com

Help/Unsubscribe/Update your Subscription:
https://lists.apple.com/mailman/options/cocoa-dev/archive%40mail-archive.com

This email sent to arch...@mail-archive.com

Re: Where is my bicycle?

2015-04-07 Thread Gerriet M. Denkmann

> On 7 Apr 2015, at 00:15, Quincey Morris  
> wrote:
> 
> On Apr 6, 2015, at 09:19 , Gerriet M. Denkmann  wrote:
> 
> A suggestion, though:
> 
> Try building your character set using ‘characterSetWithRange:’ and/or the 
> NSMutableCharacterSet methods that add ranges, instead of using NSStrings. 
> Maybe NSCharacterSet really is UTF-32-based, but not — for code compatibility 
> reasons — when using NSStrings explicitly.

This turned out to be an excellent idea - it allowed me to create a replacement 
for characterSetWithCharactersInString: which actually works:

//  bug work-around
+ (NSCharacterSet *)gmdCharacterSetWithCharactersInString: (NSString *)string
{
if ( string.length == 0 )   //  return nil
{
NSLog(@"%s string \"%@\" is empty or nil → no 
CharacterSet.",__FUNCTION__, string);
return nil;
};

NSData *dat = [ string dataUsingEncoding: NSUTF32StringEncoding ];
const UTF32Char *bytes = dat.bytes;
NSUInteger length = dat.length / sizeof(UTF32Char);

NSMutableCharacterSet *mus = [ [ NSMutableCharacterSet alloc ] init ];
for( NSUInteger i = 1; i < length; i++ )//  ignore initial 
kUnicodeByteOrderMark
{
UTF32Char codePoint = bytes[i];
[ mus addCharactersInRange: NSMakeRange( codePoint, 1 ) ];
};

return mus;
}

Thanks very much for your suggestion!


Kind regards,

Gerriet.


___

Cocoa-dev mailing list (Cocoa-dev@lists.apple.com)

Please do not post admin requests or moderator comments to the list.
Contact the moderators at cocoa-dev-admins(at)lists.apple.com

Help/Unsubscribe/Update your Subscription:
https://lists.apple.com/mailman/options/cocoa-dev/archive%40mail-archive.com

This email sent to arch...@mail-archive.com

Re: Where is my bicycle?

2015-04-06 Thread Gerriet M. Denkmann

> On 7 Apr 2015, at 05:44, Greg Parker  wrote:
> 
> 
>> On Apr 6, 2015, at 11:15 AM, Gerriet M. Denkmann  
>> wrote:
>> 
>> 2. characterSetWithCharactersInString seems to take only the lower 16 bits 
>> of the code points in the string. Bug.
>> Works ok though, if all chars in the string have code points ≥ 0x1 (e.g. 
>> "𝄞🚲")
> 
> The implementation of +characterSetWithCharactersInString: does attempt to 
> handle arbitrary code points. It tries to optimize strings that have no 
> "large" code points; my guess is that it has a bug when the string has a mix 
> of both. Please file a bug report.

Done: 20444816

Kind regards,

Gerriet.


___

Cocoa-dev mailing list (Cocoa-dev@lists.apple.com)

Please do not post admin requests or moderator comments to the list.
Contact the moderators at cocoa-dev-admins(at)lists.apple.com

Help/Unsubscribe/Update your Subscription:
https://lists.apple.com/mailman/options/cocoa-dev/archive%40mail-archive.com

This email sent to arch...@mail-archive.com

Re: Where is my bicycle?

2015-04-06 Thread Gerriet M. Denkmann

> On 7 Apr 2015, at 03:42, Quincey Morris  
> wrote:
> 
> On Apr 6, 2015, at 12:29 , Greg Parker  wrote:
>> 
>> my understanding is that when Cocoa says "character" it usually means 
>> "UTF-16 code unit". @"🚲".length == 2, for example. Cocoa's string API 
>> designed when Unicode was still a true 16-bit character set.
> 
> I would have said so, too, except that NSCharacterSet has this 
> ‘longCharacterIsMember: (UTF32Char)’ API, which seems inexplicable if the 
> parameter is a UTF-16 code unit, since that’s what ‘characterIsMember: 
> (unichar)’ is apparently for.

Well, it is really quite simple:
NSString (and others) means by "character": unsigned short in Utf-16 
representation.
But LongCharacter means: "Unicode code-point".

Both definitions were the same in Unicode 1.0 (up to about 1996) when Unicode 
was 16 bits only. Starting with 2.0 it became 21 bits.
They are still the same for code-points below 0x10 000, i.e. Plane 0, or Basic 
Multilingual Plane.

Kind regards,

Gerriet.


___

Cocoa-dev mailing list (Cocoa-dev@lists.apple.com)

Please do not post admin requests or moderator comments to the list.
Contact the moderators at cocoa-dev-admins(at)lists.apple.com

Help/Unsubscribe/Update your Subscription:
https://lists.apple.com/mailman/options/cocoa-dev/archive%40mail-archive.com

This email sent to arch...@mail-archive.com

Re: Where is my bicycle?

2015-04-06 Thread Michael Crawford
If you're unable to do what you need with Cocoa, maybe it would work to use ICU.


Michael David Crawford, Consulting Software Engineer
mdcrawf...@gmail.com
http://www.warplife.com/mdc/

   Available for Software Development in the Portland, Oregon Metropolitan
Area.


On Mon, Apr 6, 2015 at 4:57 PM, Quincey Morris
 wrote:
> On Apr 6, 2015, at 16:29 , pscott  wrote:
>>
>> But what you were describing *would* be UCS-2. To claim UTF-16 support, 
>> variable length encoding must be handled.
>
> It's pretty much understood -- on this list -- that NSString is based on 
> UTF-16, so we tend to cut the corner that's bothering you. This is 
> complicated by the fact that NSString is a bit weird. Its underlying 
> representation is UTF-16 strings, but its API is "array of UTF-16 code 
> units". That means you can create an invalid UTF-16 string with the NSString 
> API. The fact that we're not supposed to do that is also pretty much 
> understood.
>
> This messiness, along with the use of the ambiguous word "character" or 
> "Unicode character" in the documentation, is all for historical reasons.
>
> NSCharacterSet is something else again. We don't actually know whether:
>
> -- it's implemented as a set of UTF-16 code units, instead of code points
>
> -- it handles UTF-16 surrogate pairs properly, in which of its API methods
>
> -- it handles UTF-32 code units properly, in which of its API methods
>
> -- it has bugs that prevent it from doing what it's intended to do, whatever 
> that is
>
> Greg has basically given us the answers: "not code units", "possibly", "it's 
> supposed to", and "probably". :)
>
>
>
> ___
>
> Cocoa-dev mailing list (Cocoa-dev@lists.apple.com)
>
> Please do not post admin requests or moderator comments to the list.
> Contact the moderators at cocoa-dev-admins(at)lists.apple.com
>
> Help/Unsubscribe/Update your Subscription:
> https://lists.apple.com/mailman/options/cocoa-dev/mdcrawford%40gmail.com
>
> This email sent to mdcrawf...@gmail.com

___

Cocoa-dev mailing list (Cocoa-dev@lists.apple.com)

Please do not post admin requests or moderator comments to the list.
Contact the moderators at cocoa-dev-admins(at)lists.apple.com

Help/Unsubscribe/Update your Subscription:
https://lists.apple.com/mailman/options/cocoa-dev/archive%40mail-archive.com

This email sent to arch...@mail-archive.com

Re: Where is my bicycle?

2015-04-06 Thread Quincey Morris
On Apr 6, 2015, at 16:29 , pscott  wrote:
> 
> But what you were describing *would* be UCS-2. To claim UTF-16 support, 
> variable length encoding must be handled.

It’s pretty much understood — on this list — that NSString is based on UTF-16, 
so we tend to cut the corner that’s bothering you. This is complicated by the 
fact that NSString is a bit weird. Its underlying representation is UTF-16 
strings, but its API is "array of UTF-16 code units”. That means you can create 
an invalid UTF-16 string with the NSString API. The fact that we’re not 
supposed to do that is also pretty much understood.

This messiness, along with the use of the ambiguous word “character” or 
“Unicode character” in the documentation, is all for historical reasons.

NSCharacterSet is something else again. We don’t actually know whether:

— it’s implemented as a set of UTF-16 code units, instead of code points

— it handles UTF-16 surrogate pairs properly, in which of its API methods

— it handles UTF-32 code units properly, in which of its API methods

— it has bugs that prevent it from doing what it’s intended to do, whatever 
that is

Greg has basically given us the answers: “not code units”, “possibly”, “it’s 
supposed to”, and “probably”. :)



___

Cocoa-dev mailing list (Cocoa-dev@lists.apple.com)

Please do not post admin requests or moderator comments to the list.
Contact the moderators at cocoa-dev-admins(at)lists.apple.com

Help/Unsubscribe/Update your Subscription:
https://lists.apple.com/mailman/options/cocoa-dev/archive%40mail-archive.com

This email sent to arch...@mail-archive.com

Re: Where is my bicycle?

2015-04-06 Thread pscott

On 4/6/2015 4:29 PM, pscott wrote:

On 4/6/2015 4:03 PM, Greg Parker wrote:

On Apr 6, 2015, at 2:20 PM, pscott  wrote:

On 4/6/2015 12:29 PM, Greg Parker wrote:
I'm not an expert here, but my understanding is that when Cocoa 
says "character" it usually means "UTF-16 code unit". @"🚲".length 
== 2, for example. Cocoa's string API designed when Unicode was 
still a true 16-bit character set.
That would be UCS-2 encoding. If the full Unicode character set of 
1,112,064 characters isn't supported it should not be documented as 
supporting UTF-16.
No, it's not UCS-2. The API generally works as if it were 
manipulating an array of UTF-16 code units. @"🚲" displays correctly; 
it would not if the system were truly UCS-2.
Right. But what you were describing *would* be UCS-2. To claim UTF-16 
support, variable length encoding must be handled. You cannot 
legitimately claim UTF-16 support by only handling a fixed-size 
encoding (i.e., a single code unit). So, if UTF-16 support is 
intended, as documented, then there has to be a bug, as I stated in an 
earlier post; and as you did also in a later post (albeit with a 
different reasoning). 


Greg, I went back and re-read your post a number of times, and after the 
third reading, I got your meaning. I see now you were describing a 
variable length encoding. My apology for misunderstanding your point.




smime.p7s
Description: S/MIME Cryptographic Signature
___

Cocoa-dev mailing list (Cocoa-dev@lists.apple.com)

Please do not post admin requests or moderator comments to the list.
Contact the moderators at cocoa-dev-admins(at)lists.apple.com

Help/Unsubscribe/Update your Subscription:
https://lists.apple.com/mailman/options/cocoa-dev/archive%40mail-archive.com

This email sent to arch...@mail-archive.com

Re: Where is my bicycle?

2015-04-06 Thread pscott

On 4/6/2015 4:03 PM, Greg Parker wrote:

On Apr 6, 2015, at 2:20 PM, pscott  wrote:

On 4/6/2015 12:29 PM, Greg Parker wrote:
I'm not an expert here, but my understanding is that when Cocoa says "character" it usually means 
"UTF-16 code unit". @"🚲".length == 2, for example. Cocoa's string API designed when 
Unicode was still a true 16-bit character set.

That would be UCS-2 encoding. If the full Unicode character set of 1,112,064 
characters isn't supported it should not be documented as supporting UTF-16.

No, it's not UCS-2. The API generally works as if it were manipulating an array of UTF-16 
code units. @"🚲" displays correctly; it would not if the system were truly 
UCS-2.
Right. But what you were describing *would* be UCS-2. To claim UTF-16 
support, variable length encoding must be handled. You cannot 
legitimately claim UTF-16 support by only handling a fixed-size encoding 
(i.e., a single code unit). So, if UTF-16 support is intended, as 
documented, then there has to be a bug, as I stated in an earlier post; 
and as you did also in a later post (albeit with a different reasoning).





smime.p7s
Description: S/MIME Cryptographic Signature
___

Cocoa-dev mailing list (Cocoa-dev@lists.apple.com)

Please do not post admin requests or moderator comments to the list.
Contact the moderators at cocoa-dev-admins(at)lists.apple.com

Help/Unsubscribe/Update your Subscription:
https://lists.apple.com/mailman/options/cocoa-dev/archive%40mail-archive.com

This email sent to arch...@mail-archive.com

Re: Where is my bicycle?

2015-04-06 Thread Greg Parker

> On Apr 6, 2015, at 2:20 PM, pscott  wrote:
> 
>> On 4/6/2015 12:29 PM, Greg Parker wrote:
>> I'm not an expert here, but my understanding is that when Cocoa says 
>> "character" it usually means "UTF-16 code unit". @"🚲".length == 2, for 
>> example. Cocoa's string API designed when Unicode was still a true 16-bit 
>> character set.
> 
> That would be UCS-2 encoding. If the full Unicode character set of 1,112,064 
> characters isn't supported it should not be documented as supporting UTF-16.

No, it's not UCS-2. The API generally works as if it were manipulating an array 
of UTF-16 code units. @"🚲" displays correctly; it would not if the system were 
truly UCS-2. 


-- 
Greg Parker gpar...@apple.com Runtime Wrangler



___

Cocoa-dev mailing list (Cocoa-dev@lists.apple.com)

Please do not post admin requests or moderator comments to the list.
Contact the moderators at cocoa-dev-admins(at)lists.apple.com

Help/Unsubscribe/Update your Subscription:
https://lists.apple.com/mailman/options/cocoa-dev/archive%40mail-archive.com

This email sent to arch...@mail-archive.com

Re: Where is my bicycle?

2015-04-06 Thread Greg Parker

> On Apr 6, 2015, at 11:15 AM, Gerriet M. Denkmann  wrote:
> 
> 2. characterSetWithCharactersInString seems to take only the lower 16 bits of 
> the code points in the string. Bug.
> Works ok though, if all chars in the string have code points ≥ 0x1 (e.g. 
> "𝄞🚲")

The implementation of +characterSetWithCharactersInString: does attempt to 
handle arbitrary code points. It tries to optimize strings that have no "large" 
code points; my guess is that it has a bug when the string has a mix of both. 
Please file a bug report.


-- 
Greg Parker gpar...@apple.com Runtime Wrangler



___

Cocoa-dev mailing list (Cocoa-dev@lists.apple.com)

Please do not post admin requests or moderator comments to the list.
Contact the moderators at cocoa-dev-admins(at)lists.apple.com

Help/Unsubscribe/Update your Subscription:
https://lists.apple.com/mailman/options/cocoa-dev/archive%40mail-archive.com

This email sent to arch...@mail-archive.com

Re: Where is my bicycle?

2015-04-06 Thread pscott

On 4/6/2015 12:29 PM, Greg Parker wrote:
I'm not an expert here, but my understanding is that when Cocoa says 
"character" it usually means "UTF-16 code unit". @"🚲".length == 2, 
for example. Cocoa's string API designed when Unicode was still a true 
16-bit character set.


That would be UCS-2 encoding. If the full Unicode character set of 
1,112,064 characters isn't supported it should not be documented as 
supporting UTF-16.


Paul




smime.p7s
Description: S/MIME Cryptographic Signature
___

Cocoa-dev mailing list (Cocoa-dev@lists.apple.com)

Please do not post admin requests or moderator comments to the list.
Contact the moderators at cocoa-dev-admins(at)lists.apple.com

Help/Unsubscribe/Update your Subscription:
https://lists.apple.com/mailman/options/cocoa-dev/archive%40mail-archive.com

This email sent to arch...@mail-archive.com

Re: Where is my bicycle?

2015-04-06 Thread Michael Crawford
Your bicycle showed up in my GMail in Firefox on Yosemite, but not in
Safari on my Mom's iMac running Tiger.
Michael David Crawford, Consulting Software Engineer
mdcrawf...@gmail.com
http://www.warplife.com/mdc/

   Available for Software Development in the Portland, Oregon Metropolitan
Area.


On Mon, Apr 6, 2015 at 2:09 PM, Jack Brindle  wrote:
> Have you checked the Font you are using to display the character string to 
> see if it contains the bicycle character? If not, you probably won’t get the 
> character you seek.
>
> - Jack
>
>> On Apr 6, 2015, at 11:15 AM, Gerriet M. Denkmann  
>> wrote:
>>
>>
>>> On 7 Apr 2015, at 00:15, Quincey Morris 
>>>  wrote:
>>>
>>> On Apr 6, 2015, at 09:19 , Gerriet M. Denkmann  wrote:
>>>>
>>>> Where is my bicycle gone? What am I doing wrong?
>>>
>>> Before this thread heads further into outer space…
>>>
>>> I suspect it [NSCharacterSet] is just broken. Look here, for example:
>>>
>>>  
>>> http://stackoverflow.com/questions/23000812/creating-nscharacterset-with-unicode-smp-entries-testing-membership-is-this
>>>
>>> The problem is that it’s unclear whether the “characters” in NSCharacterSet 
>>> are internally UTF-16 code units, UTF-32 code units, Unicode code points, 
>>> or something else. According to the NSCharacterSet documentation:
>>>
>>>> "An NSCharacterSet object represents a set of Unicode-compliant 
>>>> characters.”
>>>
>>> and:
>>>
>>>> "The NSCharacterSet class declares the programmatic interface for an 
>>>> object that manages a set of Unicode characters (see the NSString class 
>>>> cluster specification for information on Unicode).”
>>>
>>> According the NSString documentation:
>>>
>>>> "A string object presents itself as an array of Unicode characters 
>>>> (Unicode is a registered trademark of Unicode, Inc.). You can determine 
>>>> how many characters a string object contains with the length method and 
>>>> can retrieve a specific character with the characterAtIndex: method.”
>>>
>>> Working backwards, we know that the characters that are counted by 
>>> -[NSString length]’ are UTF-16 code units, so this all *possibly* implies 
>>> that NSCharacterSet characters are UTF-16 code units, too. Plus, back in 
>>> NSCharacterSet documentation:
>>>
>>>> "NSCharacterSet’s principal primitive method, characterIsMember:, provides 
>>>> the basis for all other instance methods in its interface.”
>>>
>>> If that’s true, ‘longCharacterIsMember:’ is pretty much screwed.
>>>
>>> Perhaps the NSCharacterSet documentation is just wrong. Or perhaps, when 
>>> the API was enhanced in 10.2 (see: 
>>> http://www.cocoabuilder.com/archive/cocoa/73297-working-with-32-bit-unicode-nsstring-stringwithutf32string-const-utf32char-bytes-needed.html,
>>>  for some tantalizing hints about NSCharacterSet), the implementation was a 
>>> hack that works somehow but isn’t documented. I don’t think you’re going to 
>>> get any definitive answer except directly from Apple.
>>>
>>> A suggestion, though:
>>>
>>> Try building your character set using ‘characterSetWithRange:’ and/or the 
>>> NSMutableCharacterSet methods that add ranges, instead of using NSStrings. 
>>> Maybe NSCharacterSet really is UTF-32-based, but not — for code 
>>> compatibility reasons — when using NSStrings explicitly.
>>
>> 1. longCharacterIsMember seems to be ok:
>>   NSCharacterSet *alphanumericCharacterSet = [ NSCharacterSet 
>> alphanumericCharacterSet ];
>>   BOOL pp = [ alphanumericCharacterSet longCharacterIsMember: 
>> 0x2f800 ];
>> returns YES as it should.
>>
>> 2. characterSetWithCharactersInString seems to take only the lower 16 bits 
>> of the code points in the string. Bug.
>> Works ok though, if all chars in the string have code points ≥ 0x1 (e.g. 
>> "𝄞🚲")
>>
>> 3. the documentation about bitmapRepresentation  is wrong. It says: "A raw 
>> bitmap representation of a character set is a byte array of 2^16 bits (that 
>> is, 8192 bytes)."
>> But alphanumericCharacterSet has a bitmap with 32771 = 0x8003 bytes, which 
>> mostly look ok.
>> It has some strange things though at the end:
>> 0x2fa1e → 0x2fa2d
>> 0x30011 → 0x30207
>> which I do not recognise as alphanumeric.
>>
>> 4. characterSetWithRange 

Re: Where is my bicycle?

2015-04-06 Thread Jack Brindle
Have you checked the Font you are using to display the character string to see 
if it contains the bicycle character? If not, you probably won’t get the 
character you seek.

- Jack

> On Apr 6, 2015, at 11:15 AM, Gerriet M. Denkmann  wrote:
> 
> 
>> On 7 Apr 2015, at 00:15, Quincey Morris 
>>  wrote:
>> 
>> On Apr 6, 2015, at 09:19 , Gerriet M. Denkmann  wrote:
>>> 
>>> Where is my bicycle gone? What am I doing wrong?
>> 
>> Before this thread heads further into outer space…
>> 
>> I suspect it [NSCharacterSet] is just broken. Look here, for example:
>> 
>>  
>> http://stackoverflow.com/questions/23000812/creating-nscharacterset-with-unicode-smp-entries-testing-membership-is-this
>> 
>> The problem is that it’s unclear whether the “characters” in NSCharacterSet 
>> are internally UTF-16 code units, UTF-32 code units, Unicode code points, or 
>> something else. According to the NSCharacterSet documentation:
>> 
>>> "An NSCharacterSet object represents a set of Unicode-compliant characters.”
>> 
>> and:
>> 
>>> "The NSCharacterSet class declares the programmatic interface for an object 
>>> that manages a set of Unicode characters (see the NSString class cluster 
>>> specification for information on Unicode).”
>> 
>> According the NSString documentation:
>> 
>>> "A string object presents itself as an array of Unicode characters (Unicode 
>>> is a registered trademark of Unicode, Inc.). You can determine how many 
>>> characters a string object contains with the length method and can retrieve 
>>> a specific character with the characterAtIndex: method.”
>> 
>> Working backwards, we know that the characters that are counted by 
>> -[NSString length]’ are UTF-16 code units, so this all *possibly* implies 
>> that NSCharacterSet characters are UTF-16 code units, too. Plus, back in 
>> NSCharacterSet documentation:
>> 
>>> "NSCharacterSet’s principal primitive method, characterIsMember:, provides 
>>> the basis for all other instance methods in its interface.”
>> 
>> If that’s true, ‘longCharacterIsMember:’ is pretty much screwed.
>> 
>> Perhaps the NSCharacterSet documentation is just wrong. Or perhaps, when the 
>> API was enhanced in 10.2 (see: 
>> http://www.cocoabuilder.com/archive/cocoa/73297-working-with-32-bit-unicode-nsstring-stringwithutf32string-const-utf32char-bytes-needed.html,
>>  for some tantalizing hints about NSCharacterSet), the implementation was a 
>> hack that works somehow but isn’t documented. I don’t think you’re going to 
>> get any definitive answer except directly from Apple.
>> 
>> A suggestion, though:
>> 
>> Try building your character set using ‘characterSetWithRange:’ and/or the 
>> NSMutableCharacterSet methods that add ranges, instead of using NSStrings. 
>> Maybe NSCharacterSet really is UTF-32-based, but not — for code 
>> compatibility reasons — when using NSStrings explicitly.
> 
> 1. longCharacterIsMember seems to be ok:
>   NSCharacterSet *alphanumericCharacterSet = [ NSCharacterSet 
> alphanumericCharacterSet ];
>   BOOL pp = [ alphanumericCharacterSet longCharacterIsMember: 
> 0x2f800 ];
> returns YES as it should.
> 
> 2. characterSetWithCharactersInString seems to take only the lower 16 bits of 
> the code points in the string. Bug.
> Works ok though, if all chars in the string have code points ≥ 0x1 (e.g. 
> "𝄞🚲")
> 
> 3. the documentation about bitmapRepresentation  is wrong. It says: "A raw 
> bitmap representation of a character set is a byte array of 2^16 bits (that 
> is, 8192 bytes)."
> But alphanumericCharacterSet has a bitmap with 32771 = 0x8003 bytes, which 
> mostly look ok.
> It has some strange things though at the end: 
> 0x2fa1e → 0x2fa2d 
> 0x30011 → 0x30207 
> which I do not recognise as alphanumeric.
> 
> 4. characterSetWithRange works a bit better:
>   NSCharacterSet *a = [ NSCharacterSet characterSetWithRange: 
> NSMakeRange(0x1F6B2,1) ];
>   BOOL pp = [ a longCharacterIsMember: 0x1F6B2 ]; → returns YES as it 
> should.
> 
> But when I look at the bitmapRepresentation I see 16385 bytes with two bits 
> set: 0x1 and 0x1f6ba (8 bits off)
> 
> Looks like the format of the bitmapRepresentation is slightly more complex 
> than documented.
> 
> 
> Kind regards,
> 
> Gerriet.
> 
> 
> ___
> 
> Cocoa-dev mailing list (Cocoa-dev@lists.apple.com)
> 
> Please do not post admin requests or moderator comments to the li

Re: Where is my bicycle?

2015-04-06 Thread Quincey Morris
On Apr 6, 2015, at 12:29 , Greg Parker  wrote:
> 
> my understanding is that when Cocoa says "character" it usually means "UTF-16 
> code unit". @"🚲".length == 2, for example. Cocoa's string API designed when 
> Unicode was still a true 16-bit character set.

I would have said so, too, except that NSCharacterSet has this 
‘longCharacterIsMember: (UTF32Char)’ API, which seems inexplicable if the 
parameter is a UTF-16 code unit, since that’s what ‘characterIsMember: 
(unichar)’ is apparently for.



___

Cocoa-dev mailing list (Cocoa-dev@lists.apple.com)

Please do not post admin requests or moderator comments to the list.
Contact the moderators at cocoa-dev-admins(at)lists.apple.com

Help/Unsubscribe/Update your Subscription:
https://lists.apple.com/mailman/options/cocoa-dev/archive%40mail-archive.com

This email sent to arch...@mail-archive.com

Re: Where is my bicycle?

2015-04-06 Thread Greg Parker

> On Apr 6, 2015, at 10:15 AM, Quincey Morris 
>  wrote:
> 
>> On Apr 6, 2015, at 09:19 , Gerriet M. Denkmann  wrote:
>> 
>> Where is my bicycle gone? What am I doing wrong?
> 
> The problem is that it’s unclear whether the “characters” in NSCharacterSet 
> are internally UTF-16 code units, UTF-32 code units, Unicode code points, or 
> something else. According to the NSCharacterSet documentation:

I'm not an expert here, but my understanding is that when Cocoa says 
"character" it usually means "UTF-16 code unit". @"🚲".length == 2, for example. 
Cocoa's string API designed when Unicode was still a true 16-bit character set.


-- 
Greg Parker gpar...@apple.com Runtime Wrangler



___

Cocoa-dev mailing list (Cocoa-dev@lists.apple.com)

Please do not post admin requests or moderator comments to the list.
Contact the moderators at cocoa-dev-admins(at)lists.apple.com

Help/Unsubscribe/Update your Subscription:
https://lists.apple.com/mailman/options/cocoa-dev/archive%40mail-archive.com

This email sent to arch...@mail-archive.com

Re: Where is my bicycle?

2015-04-06 Thread Gerriet M. Denkmann

> On 7 Apr 2015, at 00:15, Quincey Morris  
> wrote:
> 
> On Apr 6, 2015, at 09:19 , Gerriet M. Denkmann  wrote:
>> 
>> Where is my bicycle gone? What am I doing wrong?
> 
> Before this thread heads further into outer space…
> 
> I suspect it [NSCharacterSet] is just broken. Look here, for example:
> 
>   
> http://stackoverflow.com/questions/23000812/creating-nscharacterset-with-unicode-smp-entries-testing-membership-is-this
> 
> The problem is that it’s unclear whether the “characters” in NSCharacterSet 
> are internally UTF-16 code units, UTF-32 code units, Unicode code points, or 
> something else. According to the NSCharacterSet documentation:
> 
>> "An NSCharacterSet object represents a set of Unicode-compliant characters.”
> 
> and:
> 
>> "The NSCharacterSet class declares the programmatic interface for an object 
>> that manages a set of Unicode characters (see the NSString class cluster 
>> specification for information on Unicode).”
> 
> According the NSString documentation:
> 
>> "A string object presents itself as an array of Unicode characters (Unicode 
>> is a registered trademark of Unicode, Inc.). You can determine how many 
>> characters a string object contains with the length method and can retrieve 
>> a specific character with the characterAtIndex: method.”
> 
> Working backwards, we know that the characters that are counted by -[NSString 
> length]’ are UTF-16 code units, so this all *possibly* implies that 
> NSCharacterSet characters are UTF-16 code units, too. Plus, back in 
> NSCharacterSet documentation:
> 
>> "NSCharacterSet’s principal primitive method, characterIsMember:, provides 
>> the basis for all other instance methods in its interface.”
> 
> If that’s true, ‘longCharacterIsMember:’ is pretty much screwed.
> 
> Perhaps the NSCharacterSet documentation is just wrong. Or perhaps, when the 
> API was enhanced in 10.2 (see: 
> http://www.cocoabuilder.com/archive/cocoa/73297-working-with-32-bit-unicode-nsstring-stringwithutf32string-const-utf32char-bytes-needed.html,
>  for some tantalizing hints about NSCharacterSet), the implementation was a 
> hack that works somehow but isn’t documented. I don’t think you’re going to 
> get any definitive answer except directly from Apple.
> 
> A suggestion, though:
> 
> Try building your character set using ‘characterSetWithRange:’ and/or the 
> NSMutableCharacterSet methods that add ranges, instead of using NSStrings. 
> Maybe NSCharacterSet really is UTF-32-based, but not — for code compatibility 
> reasons — when using NSStrings explicitly.

1. longCharacterIsMember seems to be ok:
NSCharacterSet *alphanumericCharacterSet = [ NSCharacterSet 
alphanumericCharacterSet ];
BOOL pp = [ alphanumericCharacterSet longCharacterIsMember: 
0x2f800 ];
returns YES as it should.

2. characterSetWithCharactersInString seems to take only the lower 16 bits of 
the code points in the string. Bug.
Works ok though, if all chars in the string have code points ≥ 0x1 (e.g. 
"𝄞🚲")

3. the documentation about bitmapRepresentation  is wrong. It says: "A raw 
bitmap representation of a character set is a byte array of 2^16 bits (that is, 
8192 bytes)."
But alphanumericCharacterSet has a bitmap with 32771 = 0x8003 bytes, which 
mostly look ok.
It has some strange things though at the end: 
0x2fa1e → 0x2fa2d 
0x30011 → 0x30207 
which I do not recognise as alphanumeric.

4. characterSetWithRange works a bit better:
NSCharacterSet *a = [ NSCharacterSet characterSetWithRange: 
NSMakeRange(0x1F6B2,1) ];
BOOL pp = [ a longCharacterIsMember: 0x1F6B2 ]; → returns YES as it 
should.

But when I look at the bitmapRepresentation I see 16385 bytes with two bits 
set: 0x1 and 0x1f6ba (8 bits off)

Looks like the format of the bitmapRepresentation is slightly more complex than 
documented.


Kind regards,

Gerriet.


___

Cocoa-dev mailing list (Cocoa-dev@lists.apple.com)

Please do not post admin requests or moderator comments to the list.
Contact the moderators at cocoa-dev-admins(at)lists.apple.com

Help/Unsubscribe/Update your Subscription:
https://lists.apple.com/mailman/options/cocoa-dev/archive%40mail-archive.com

This email sent to arch...@mail-archive.com

Re: Where is my bicycle?

2015-04-06 Thread Paul Scott
On Apr 6, 2015, at 9:57 AM, Charles Srstka  wrote:
> 
> The problem, then, is likely the fact that NSCharacterSet considers a 
> “character” simply as a UTF-16 code point, rather than a true Unicode 
> character as Swift does.

That should not matter. UTF-16 is a variable length encoding. It is guaranteed 
to support all 1,112,064 possible Unicode characters. In order to do that it 
MUST be variable length, either 2-octets or 4-octets.

This appears to be a bug in the Objective-C handling of UTF-16.

smime.p7s
Description: S/MIME cryptographic signature
___

Cocoa-dev mailing list (Cocoa-dev@lists.apple.com)

Please do not post admin requests or moderator comments to the list.
Contact the moderators at cocoa-dev-admins(at)lists.apple.com

Help/Unsubscribe/Update your Subscription:
https://lists.apple.com/mailman/options/cocoa-dev/archive%40mail-archive.com

This email sent to arch...@mail-archive.com

Re: Where is my bicycle?

2015-04-06 Thread Quincey Morris
On Apr 6, 2015, at 09:19 , Gerriet M. Denkmann  wrote:
> 
> Where is my bicycle gone? What am I doing wrong?

Before this thread heads further into outer space…

I suspect it [NSCharacterSet] is just broken. Look here, for example:


http://stackoverflow.com/questions/23000812/creating-nscharacterset-with-unicode-smp-entries-testing-membership-is-this
 
<http://stackoverflow.com/questions/23000812/creating-nscharacterset-with-unicode-smp-entries-testing-membership-is-this>

The problem is that it’s unclear whether the “characters” in NSCharacterSet are 
internally UTF-16 code units, UTF-32 code units, Unicode code points, or 
something else. According to the NSCharacterSet documentation:

> "An NSCharacterSet object represents a set of Unicode-compliant characters.”


and:

> "The NSCharacterSet class declares the programmatic interface for an object 
> that manages a set of Unicode characters (see the NSString class cluster 
> specification for information on Unicode).”


According the NSString documentation:

> "A string object presents itself as an array of Unicode characters (Unicode 
> is a registered trademark of Unicode, Inc.). You can determine how many 
> characters a string object contains with the length method and can retrieve a 
> specific character with the characterAtIndex: method.”


Working backwards, we know that the characters that are counted by -[NSString 
length]’ are UTF-16 code units, so this all *possibly* implies that 
NSCharacterSet characters are UTF-16 code units, too. Plus, back in 
NSCharacterSet documentation:

> "NSCharacterSet’s principal primitive method, characterIsMember:, provides 
> the basis for all other instance methods in its interface.”


If that’s true, ‘longCharacterIsMember:’ is pretty much screwed.

Perhaps the NSCharacterSet documentation is just wrong. Or perhaps, when the 
API was enhanced in 10.2 (see: 
http://www.cocoabuilder.com/archive/cocoa/73297-working-with-32-bit-unicode-nsstring-stringwithutf32string-const-utf32char-bytes-needed.html
 
<http://www.cocoabuilder.com/archive/cocoa/73297-working-with-32-bit-unicode-nsstring-stringwithutf32string-const-utf32char-bytes-needed.html>,
 for some tantalizing hints about NSCharacterSet), the implementation was a 
hack that works somehow but isn’t documented. I don’t think you’re going to get 
any definitive answer except directly from Apple.

A suggestion, though:

Try building your character set using ‘characterSetWithRange:’ and/or the 
NSMutableCharacterSet methods that add ranges, instead of using NSStrings. 
Maybe NSCharacterSet really is UTF-32-based, but not — for code compatibility 
reasons — when using NSStrings explicitly.




___

Cocoa-dev mailing list (Cocoa-dev@lists.apple.com)

Please do not post admin requests or moderator comments to the list.
Contact the moderators at cocoa-dev-admins(at)lists.apple.com

Help/Unsubscribe/Update your Subscription:
https://lists.apple.com/mailman/options/cocoa-dev/archive%40mail-archive.com

This email sent to arch...@mail-archive.com

Re: Where is my bicycle?

2015-04-06 Thread Gerriet M. Denkmann

> On 6 Apr 2015, at 23:52, Steve Mills  wrote:
> 
> On Apr 6, 2015, at 11:45:52, Gerriet M. Denkmann  wrote:
>> 
>>> 
NSString *string = @"abc 🚲 xyz";// BICYCLE = U+1F6B2
>>> 
>> 
>> If this is so: why did my compiler not tell me about this?
>> 
>>  NSString *string = @"abc 〄 xyz";// JAPANESE INDUSTRIAL STANDARD 
>> SYMBOL = U+3004
> 
> Perhaps because U=3004 is a 2-byte value and U+1f6b2 is not? Have you tried 
> changing the encoding of your source file to something else? You'll probably 
> have to store such strings in .strings or hardcode the hex values and build 
> the strings from that.

You are right: 

My string looks like:
 string "abc 🚲 xyz" contains:
char[  0] = 0x00061
char[  1] = 0x00062
char[  2] = 0x00063
char[  3] = 0x00020
char[  4] = 0x1f6b2 ← this is a bicycle
char[  5] = 0x00020
char[  6] = 0x00078
char[  7] = 0x00079
char[  8] = 0x0007a

which seems ok.

But when I print the bits in NSCharacterSet bitmapRepresentation I get:

 bit[  1] = 0x00020 = " "
 bit[  2] = 0x00061 = "a"
 bit[  3] = 0x00062 = "b"
 bit[  4] = 0x00063 = "c"
 bit[  5] = 0x00078 = "x"
 bit[  6] = 0x00079 = "y"
 bit[  7] = 0x0007a = "z"
 bit[  8] = 0x0f6b2 = ""   ← this should be 0x1f6b2, which is a bicycle.

Looks like there is a bug in characterSetWithCharactersInString, or not?

Kind regards,

Gerriet.



___

Cocoa-dev mailing list (Cocoa-dev@lists.apple.com)

Please do not post admin requests or moderator comments to the list.
Contact the moderators at cocoa-dev-admins(at)lists.apple.com

Help/Unsubscribe/Update your Subscription:
https://lists.apple.com/mailman/options/cocoa-dev/archive%40mail-archive.com

This email sent to arch...@mail-archive.com

Re: Where is my bicycle?

2015-04-06 Thread Charles Srstka
On Apr 6, 2015, at 11:49 AM, Sean McBride  wrote:
> 
> On Mon, 6 Apr 2015 11:36:38 -0500, Charles Srstka said:
> 
>> Objective-C doesn’t support Unicode in source files (although Swift does).
> 
> Yes it does, and it has for many years too.

Huh, I just checked the documentation, and you’re right, they appear to have 
changed that at some point. It definitely used to say in there that using 
Unicode in a source file was officially verboten, although it usually 
accidentally worked anyway.

The problem, then, is likely the fact that NSCharacterSet considers a 
“character” simply as a UTF-16 code point, rather than a true Unicode character 
as Swift does.

Sorry for the noise.

Charles


___

Cocoa-dev mailing list (Cocoa-dev@lists.apple.com)

Please do not post admin requests or moderator comments to the list.
Contact the moderators at cocoa-dev-admins(at)lists.apple.com

Help/Unsubscribe/Update your Subscription:
https://lists.apple.com/mailman/options/cocoa-dev/archive%40mail-archive.com

This email sent to arch...@mail-archive.com

Re: Where is my bicycle?

2015-04-06 Thread Steve Mills
On Apr 6, 2015, at 11:45:52, Gerriet M. Denkmann  wrote:
> 
>> 
>>> NSString *string = @"abc 🚲 xyz";// BICYCLE = U+1F6B2
>> 
> 
> If this is so: why did my compiler not tell me about this?
> 
>   NSString *string = @"abc 〄 xyz";// JAPANESE INDUSTRIAL STANDARD 
> SYMBOL = U+3004

Perhaps because U=3004 is a 2-byte value and U+1f6b2 is not? Have you tried 
changing the encoding of your source file to something else? You'll probably 
have to store such strings in .strings or hardcode the hex values and build the 
strings from that.

--
Steve Mills
Drummer, Mac geek


___

Cocoa-dev mailing list (Cocoa-dev@lists.apple.com)

Please do not post admin requests or moderator comments to the list.
Contact the moderators at cocoa-dev-admins(at)lists.apple.com

Help/Unsubscribe/Update your Subscription:
https://lists.apple.com/mailman/options/cocoa-dev/archive%40mail-archive.com

This email sent to arch...@mail-archive.com

Re: Where is my bicycle?

2015-04-06 Thread Sean McBride
On Mon, 6 Apr 2015 11:36:38 -0500, Charles Srstka said:

>Objective-C doesn’t support Unicode in source files (although Swift does).

Yes it does, and it has for many years too.

Cheers,

-- 

Sean McBride, B. Eng s...@rogue-research.com
Rogue Researchwww.rogue-research.com 
Mac Software Developer  Montréal, Québec, Canada

___

Cocoa-dev mailing list (Cocoa-dev@lists.apple.com)

Please do not post admin requests or moderator comments to the list.
Contact the moderators at cocoa-dev-admins(at)lists.apple.com

Help/Unsubscribe/Update your Subscription:
https://lists.apple.com/mailman/options/cocoa-dev/archive%40mail-archive.com

This email sent to arch...@mail-archive.com

Re: Where is my bicycle?

2015-04-06 Thread Gerriet M. Denkmann

> On 6 Apr 2015, at 23:36, Charles Srstka  wrote:
> 
>> 
>> On Apr 6, 2015, at 11:19 AM, Gerriet M. Denkmann  
>> wrote:
>> 
>> OS X 10.10.2
>> 
>>  NSString *string = @"abc 🚲 xyz";// BICYCLE = U+1F6B2
>>  NSCharacterSet *charSet = [ NSCharacterSet 
>> characterSetWithCharactersInString: string ];
>>  BOOL pq = [ charSet longCharacterIsMember: 0x1F6B2 ];
>>  NSLog(@"%s CharacterSet from \"%@\" contains %s🚲 
>> (0x1F6B2)",__FUNCTION__, string, pq ? "" : "no "); 
>> 
>> This prints:
>> 
>> CharacterSet from "abc 🚲 xyz" contains no 🚲 (0x1F6B2)
>> 
>> Where is my bicycle gone? What am I doing wrong?
> 
> Objective-C doesn’t support Unicode in source files (although Swift does).
> 
> Charles
> 

If this is so: why did my compiler not tell me about this?

Why does this:
NSString *string = @"abc 〄 xyz";// JAPANESE INDUSTRIAL STANDARD 
SYMBOL = U+3004
NSCharacterSet *charSet = [ NSCharacterSet 
characterSetWithCharactersInString: string ];
BOOL pq = [ charSet longCharacterIsMember: 0x3004 ];
NSLog(@"%s CharacterSet from \"%@\" contains %s〄 
(0x3004)",__FUNCTION__,  string, pq ? "" : "no ");


print: 
CharacterSet from "abc 〄 xyz" contains 〄 (0x3004)


Kind regards,

Gerriet.



___

Cocoa-dev mailing list (Cocoa-dev@lists.apple.com)

Please do not post admin requests or moderator comments to the list.
Contact the moderators at cocoa-dev-admins(at)lists.apple.com

Help/Unsubscribe/Update your Subscription:
https://lists.apple.com/mailman/options/cocoa-dev/archive%40mail-archive.com

This email sent to arch...@mail-archive.com

Re: Where is my bicycle?

2015-04-06 Thread Charles Srstka
> 
> On Apr 6, 2015, at 11:19 AM, Gerriet M. Denkmann  wrote:
> 
> OS X 10.10.2
> 
>   NSString *string = @"abc 🚲 xyz";// BICYCLE = U+1F6B2
>   NSCharacterSet *charSet = [ NSCharacterSet 
> characterSetWithCharactersInString: string ];
>   BOOL pq = [ charSet longCharacterIsMember: 0x1F6B2 ];
>   NSLog(@"%s CharacterSet from \"%@\" contains %s🚲 
> (0x1F6B2)",__FUNCTION__, string, pq ? "" : "no "); 
> 
> This prints:
> 
> CharacterSet from "abc 🚲 xyz" contains no 🚲 (0x1F6B2)
> 
> Where is my bicycle gone? What am I doing wrong?

Objective-C doesn’t support Unicode in source files (although Swift does).

Charles


___

Cocoa-dev mailing list (Cocoa-dev@lists.apple.com)

Please do not post admin requests or moderator comments to the list.
Contact the moderators at cocoa-dev-admins(at)lists.apple.com

Help/Unsubscribe/Update your Subscription:
https://lists.apple.com/mailman/options/cocoa-dev/archive%40mail-archive.com

This email sent to arch...@mail-archive.com

Where is my bicycle?

2015-04-06 Thread Gerriet M. Denkmann
OS X 10.10.2

NSString *string = @"abc 🚲 xyz";// BICYCLE = U+1F6B2
NSCharacterSet *charSet = [ NSCharacterSet 
characterSetWithCharactersInString: string ];
BOOL pq = [ charSet longCharacterIsMember: 0x1F6B2 ];
NSLog(@"%s CharacterSet from \"%@\" contains %s🚲 
(0x1F6B2)",__FUNCTION__, string, pq ? "" : "no "); 

This prints:

CharacterSet from "abc 🚲 xyz" contains no 🚲 (0x1F6B2)

Where is my bicycle gone? What am I doing wrong?

Gerriet.


___

Cocoa-dev mailing list (Cocoa-dev@lists.apple.com)

Please do not post admin requests or moderator comments to the list.
Contact the moderators at cocoa-dev-admins(at)lists.apple.com

Help/Unsubscribe/Update your Subscription:
https://lists.apple.com/mailman/options/cocoa-dev/archive%40mail-archive.com

This email sent to arch...@mail-archive.com