Re: Unicode filenames with Apple File System and UIManagedDocument

2017-03-08 Thread Peter Edberg

> On Mar 8, 2017, at 1:44 PM, David Reed  wrote:
> 
> 
>> On Mar 8, 2017, at 4:35 PM, Peter Edberg  wrote:
>> 
>> 
>>> On Mar 8, 2017, at 12:00 PM, cocoa-dev-requ...@lists.apple.com wrote:
>>> 
>>> Message: 1
>>> Date: Tue, 07 Mar 2017 15:03:41 -0500
>>> From: davel...@mac.com
>>> To: Alastair Houghton ,   David Duncan
>>> 
>>> Cc: cocoa-dev list 
>>> Subject: Re: Unicode filenames with Apple File System and
>>> UIManagedDocument
>>> 
>>> 
>>> 
>>> My app has the option to zip up the directories UIManagedDocument creates 
>>> and email it (so users can back up their data or share it with others). The 
>>> person sent it to me. Below is what I did in the Terminal so you can see 
>>> what happens when I try to unzip it. If this doesn’t come through on the 
>>> email list with the characters looking correct, I can screenshot it.
>>> 
>>> This is one of the data files that was created on iOS 10.2 and then won’t 
>>> open now on an iOS 10.3 device. It appears the directory name and zip file 
>>> name do not match and it won’t unzip correctly. It does create a directory 
>>> but the directory is empty instead of containing the StoreContent and 
>>> persistentStore files. The zip file is 34KB so it may or may not actually 
>>> have the data in it.
>>> 
>>> $ ls
>>> إعلام.zip
>> 
>> 
>> It is probably worth noting that the first Arabic character in the above 
>> filename (i.e. the one that appears on the right, adjacent to the period) 
>> has a canonical decomposition, as per this line from UnicodeData.txt 
>> (http://www.unicode.org/Public/9.0.0/ucd/UnicodeData.txt 
>> <http://www.unicode.org/Public/9.0.0/ucd/UnicodeData.txt>):
>> 0625;ARABIC LETTER ALEF WITH HAMZA BELOW;Lo;0;AL;0627 0655;...
>> 
>> That is, in some cases this character 0625 (UTF8: D8 A5)  will be converted 
>> to the sequence 0627 0655 (UTF8: D8 A7 D9 95).
>> 
>> This decomposition was introduced in Unicode 3.0. If there are processes 
>> that use decomposition according to Unicode 9 versus Unicode 2.x, or 
>> processes that don't decompose versus ones that do, then the filename bytes 
>> will be different.
>> 
>> - Peter E
> 
> Thanks Peter.
> 
> I am going to try to find time in the next few days to file a bug report. 
> I'll obviously include this information. Is there anything else you think I 
> should include?

Nothing else leaps out at me other than the usuals (system version, exact steps 
to repro, etc.).
- Peter E
___

Cocoa-dev mailing list (Cocoa-dev@lists.apple.com)

Please do not post admin requests or moderator comments to the list.
Contact the moderators at cocoa-dev-admins(at)lists.apple.com

Help/Unsubscribe/Update your Subscription:
https://lists.apple.com/mailman/options/cocoa-dev/archive%40mail-archive.com

This email sent to arch...@mail-archive.com

Re: Unicode filenames with Apple File System and UIManagedDocument

2017-03-08 Thread Peter Edberg

> On Mar 8, 2017, at 12:00 PM, cocoa-dev-requ...@lists.apple.com wrote:
> 
> Message: 1
> Date: Tue, 07 Mar 2017 15:03:41 -0500
> From: davel...@mac.com
> To: Alastair Houghton , David Duncan
>   
> Cc: cocoa-dev list 
> Subject: Re: Unicode filenames with Apple File System and
>   UIManagedDocument
> 
> 
> 
> My app has the option to zip up the directories UIManagedDocument creates and 
> email it (so users can back up their data or share it with others). The 
> person sent it to me. Below is what I did in the Terminal so you can see what 
> happens when I try to unzip it. If this doesn’t come through on the email 
> list with the characters looking correct, I can screenshot it.
> 
> This is one of the data files that was created on iOS 10.2 and then won’t 
> open now on an iOS 10.3 device. It appears the directory name and zip file 
> name do not match and it won’t unzip correctly. It does create a directory 
> but the directory is empty instead of containing the StoreContent and 
> persistentStore files. The zip file is 34KB so it may or may not actually 
> have the data in it.
> 
> $ ls
> إعلام.zip


It is probably worth noting that the first Arabic character in the above 
filename (i.e. the one that appears on the right, adjacent to the period) has a 
canonical decomposition, as per this line from UnicodeData.txt 
(http://www.unicode.org/Public/9.0.0/ucd/UnicodeData.txt 
):
0625;ARABIC LETTER ALEF WITH HAMZA BELOW;Lo;0;AL;0627 0655;...

That is, in some cases this character 0625 (UTF8: D8 A5)  will be converted to 
the sequence 0627 0655 (UTF8: D8 A7 D9 95).

This decomposition was introduced in Unicode 3.0. If there are processes that 
use decomposition according to Unicode 9 versus Unicode 2.x, or processes that 
don't decompose versus ones that do, then the filename bytes will be different.

- Peter E



___

Cocoa-dev mailing list (Cocoa-dev@lists.apple.com)

Please do not post admin requests or moderator comments to the list.
Contact the moderators at cocoa-dev-admins(at)lists.apple.com

Help/Unsubscribe/Update your Subscription:
https://lists.apple.com/mailman/options/cocoa-dev/archive%40mail-archive.com

This email sent to arch...@mail-archive.com

Subject:

2014-08-07 Thread Peter Edberg

On Aug 7, 2014, at 8:51 PM, cocoa-dev-requ...@lists.apple.com wrote:
> Date: Thu, 07 Aug 2014 18:31:45 -0500
> From: Kyle Sluder 
> To: cocoa-dev@lists.apple.com
> Subject: Re: NSNumberFormatter negative and NSWindowsLatin1
> Message-ID:
>   <1407454305.344597.150374101.4d345...@webmail.messagingengine.com>
> Content-Type: text/plain
> 
> On Thu, Aug 7, 2014, at 03:28 PM, Peter Edberg wrote:
>> Seems like the bug here is trying to save the file as WindowsLatin1.
> 
> I'm not sure how you intended this to be interpreted. If you meant it in
> the sense of "the bug involves saving as WindowsLatin1", then I agree.
> The saving process should handle the conversion.
> 
> If you meant that saving files as WindowsLatin1 is itself a bug, that is
> a very strong presumption about Totte's problem space and requirements.
> 
> --Kyle Sluder


I meant the former, apologies for not being more clear.
- Peter E


___

Cocoa-dev mailing list (Cocoa-dev@lists.apple.com)

Please do not post admin requests or moderator comments to the list.
Contact the moderators at cocoa-dev-admins(at)lists.apple.com

Help/Unsubscribe/Update your Subscription:
https://lists.apple.com/mailman/options/cocoa-dev/archive%40mail-archive.com

This email sent to arch...@mail-archive.com

Re: NSNumberFormatter negative and NSWindowsLatin1

2014-08-07 Thread Peter Edberg
This is intentional. The character is ‘−’ U+2212 MINUS SIGN with UTF-8 
representation 0xE2 0x88 0x92. Many locales prefer the proper Unicode minus 
sign for negative numbers, instead of using U+002D HYPHEN-MINUS.

Seems like the bug here is trying to save the file as WindowsLatin1.

- Peter E

On Aug 7, 2014, at 12:00 PM, cocoa-dev-requ...@lists.apple.com wrote:
> Message: 5
> Date: Thu, 07 Aug 2014 08:43:04 +0200
> From: Totte Alm 
> To: "cocoa-dev@lists.apple.com List" 
> Subject: NSNumberFormatter negative and NSWindowsLatin1
> Message-ID: 
> Content-Type: text/plain; charset=windows-1252
> 
> Hello,
> 
> 
> In 10.9 I (or a client really) just stumbled upon a weirdness regarding 
> NSNumberFornatter (Swedish default settings), where a negative number is 
> transformed to a string and that string is later written to a file using 
> NSString Write.. 
> 
> The - sign (minus) that NSNumberFormatter puts into the string is a special 
> UTF-8 character /u8892 and it won’t convert when saving the file as 
> WindowsLatin1, NSString write giving an error.
> 
> Workaround:
> 
> [doubleFormatter setMinusSign:@"-"];
> 
> Anyone knows if this is intentional or just a bug that needs to be filed on 
> radar?
> 
> / Totte


___

Cocoa-dev mailing list (Cocoa-dev@lists.apple.com)

Please do not post admin requests or moderator comments to the list.
Contact the moderators at cocoa-dev-admins(at)lists.apple.com

Help/Unsubscribe/Update your Subscription:
https://lists.apple.com/mailman/options/cocoa-dev/archive%40mail-archive.com

This email sent to arch...@mail-archive.com

Re: NSDateFormatter not working on iOS 5.

2012-02-13 Thread Peter Edberg

On Feb 2, 2012, at 7:56 AM, John Joyce wrote:

> 
> On Feb 2, 2012, at 2:20 AM, Peter Edberg wrote:
> 
>> 
>> On Jan 31, 2012, at 2:35 PM, cocoa-dev-requ...@lists.apple.com wrote:
>>> --
>>> 
>>> Message: 1
>>> Date: Tue, 31 Jan 2012 14:10:13 -0600
>>> From: Heath Borders 
>>> To: cocoa-dev 
>>> 
>>> Peter,
>>> 
>>> If I set the locale to "en_IN" shouldn't that show the short time zone?
>>> 
>>> NSLocale *indianEnglishLocale = [[[NSLocale alloc]
>>> initWithLocaleIdentifier:@"en_IN"] autorelease];
>>> NSTimeZone *timeZone = [NSTimeZone timeZoneWithName:@"Asia/Kolkata"];
>>> NSDateFormatter *dateFormatter = [[[NSDateFormatter alloc] init] 
>>> autorelease];
>>> dateFormatter.locale = indianEnglishLocale;
>>> dateFormatter.dateFormat = @"z";
>>> dateFormatter.timeZone = timeZone;
>>> 
>>> NSLog(@"date string: %@", [dateFormatter stringFromDate:[NSDate date]]);
>>> NSLog(@"time zone abbreviation: %@", [timeZone
>>> abbreviationForDate:[NSDate date]]);
>>> 
>>> output:
>>> 
>>> date string: GMT+05:30
>>> time zone abbreviation: IST
>>> 
>>> -Heath Borders
>> 
>> 
>> Heath,
>> Yes, you are correct, for the example you provided above, [dateFormatter 
>> stringFromDate:[NSDate date]] *should* use the short time zone name "IST". 
>> The fact that it does not is due to a deficiency in the "en_IN" locale data 
>> in the versions of CLDR data used by ICU in the current OSX and iOS releases 
>> (CLDR 1.9.1 and 2.0 respectively). The "en_IN" locale in those CLDR versions 
>> did not override or supplement any of the timezone name data from the base 
>> "en" locale, whose default content is for "en_US".
>> 
>> This is already fixed for the CLDR 21 release coming in a few days. That is 
>> being incorporated into ICU 49 which will be picked up in future OSX and iOS 
>> releases.
>> 
>> - Peter E
>> 

> Is there any recommended workaround approach for this kind of scenario until 
> those updates are incorporated?
> More specifically, how would one best implement a workaround that would be 
> easily overridden by (or not clash terribly) the fix when it is eventually 
> incorporated into a release?
> Any best practices or recommendations in that area?

I will need to think about the best current workaround for this, but right now 
I am (and have been) swamped, sorry.
- Peter E

> 


___

Cocoa-dev mailing list (Cocoa-dev@lists.apple.com)

Please do not post admin requests or moderator comments to the list.
Contact the moderators at cocoa-dev-admins(at)lists.apple.com

Help/Unsubscribe/Update your Subscription:
https://lists.apple.com/mailman/options/cocoa-dev/archive%40mail-archive.com

This email sent to arch...@mail-archive.com


Re: NSDateFormatter not working on iOS 5.

2012-02-02 Thread Peter Edberg

On Jan 31, 2012, at 2:35 PM, cocoa-dev-requ...@lists.apple.com wrote:
> --
> 
> Message: 1
> Date: Tue, 31 Jan 2012 14:10:13 -0600
> From: Heath Borders 
> To: cocoa-dev 
> Subject: Re: NSDateFormatter not working on iOS 5.
> Message-ID:
>   
> Content-Type: text/plain; charset=UTF-8
> 
> Peter,
> 
> If I set the locale to "en_IN" shouldn't that show the short time zone?
> 
> NSLocale *indianEnglishLocale = [[[NSLocale alloc]
> initWithLocaleIdentifier:@"en_IN"] autorelease];
> NSTimeZone *timeZone = [NSTimeZone timeZoneWithName:@"Asia/Kolkata"];
> NSDateFormatter *dateFormatter = [[[NSDateFormatter alloc] init] autorelease];
> dateFormatter.locale = indianEnglishLocale;
> dateFormatter.dateFormat = @"z";
> dateFormatter.timeZone = timeZone;
> 
> NSLog(@"date string: %@", [dateFormatter stringFromDate:[NSDate date]]);
> NSLog(@"time zone abbreviation: %@", [timeZone
> abbreviationForDate:[NSDate date]]);
> 
> output:
> 
> date string: GMT+05:30
> time zone abbreviation: IST
> 
> -Heath Borders
> heath.bord...@gmail.com
> Twitter: heathborders
> http://heath-tech.blogspot.com


Heath,
Yes, you are correct, for the example you provided above, [dateFormatter 
stringFromDate:[NSDate date]] *should* use the short time zone name "IST". The 
fact that it does not is due to a deficiency in the "en_IN" locale data in the 
versions of CLDR data used by ICU in the current OSX and iOS releases (CLDR 
1.9.1 and 2.0 respectively). The "en_IN" locale in those CLDR versions did not 
override or supplement any of the timezone name data from the base "en" locale, 
whose default content is for "en_US".

This is already fixed for the CLDR 21 release coming in a few days. That is 
being incorporated into ICU 49 which will be picked up in future OSX and iOS 
releases.

- Peter E




___

Cocoa-dev mailing list (Cocoa-dev@lists.apple.com)

Please do not post admin requests or moderator comments to the list.
Contact the moderators at cocoa-dev-admins(at)lists.apple.com

Help/Unsubscribe/Update your Subscription:
https://lists.apple.com/mailman/options/cocoa-dev/archive%40mail-archive.com

This email sent to arch...@mail-archive.com


Re: Printing an NSDate

2012-01-19 Thread Peter Edberg

On Jan 19, 2012, at 2:39 AM, cocoa-dev-requ...@lists.apple.com wrote:
> Date: Thu, 19 Jan 2012 13:41:54 +0700
> From: "Gerriet M. Denkmann" 
> Subject: Printing an NSDate
> To: cocoa-dev@lists.apple.com
> Message-ID: 
> Content-Type: text/plain; charset=us-ascii
> 
> I want to print a date on iOS 5.0.1 ignoring the locale.
> (this is for logging - not for showing strings to users)
> 
> I assume that NSDate has no sufficient parameters to control the output.
> So I tried to use NSDateFormatter.
> 
> The desired output is something like:
> NSString *template = @"HH:mm:ss EEE dd. MMM  zzz";
> 
> NSString *dateFormat = [ NSDateFormatter dateFormatFromTemplate: template 
> options: 0 locale: nil ];
> NSDateFormatter *dateFormatter = [ [ NSDateFormatter alloc ] init ];
> [ dateFormatter setDateFormat: dateFormat ];
> …


A "template" for [ NSDateFormatter dateFormatFromTemplate: …] is not a format; 
rather, it is just treated as a list of which fields are desired in a format. 
The order and formatting of those fields in the template is ignored. The whole 
point of dateFormatFromTemplate is to take a request for those fields and turn 
it into an actual locale-appropriate format containing those fields in the 
locale-appropriate order, with locale-appropriate formatting. If you want to 
set "HH:mm:ss EEE dd. MMM  zzz" itself as the format, then just pass that 
directly to [ dateFormatter setDateFormat: … ].

- Peter E



___

Cocoa-dev mailing list (Cocoa-dev@lists.apple.com)

Please do not post admin requests or moderator comments to the list.
Contact the moderators at cocoa-dev-admins(at)lists.apple.com

Help/Unsubscribe/Update your Subscription:
http://lists.apple.com/mailman/options/cocoa-dev/archive%40mail-archive.com

This email sent to arch...@mail-archive.com


Re: NSDateFormatter not working on iOS 5.

2011-11-20 Thread Peter Edberg

On Nov 17, 2011, at 10:14 AM, Matt Neuburg wrote:

> On Wed, 16 Nov 2011 14:43:55 -0800, Peter Edberg  said:
>> ...
>> 
>> The issue is this: With the *short* timezone formats as specified by z 
>> (=zzz) or v (=vvv), there can be a lot of ambiguity. For example, "ET" for 
>> Eastern Time" could apply to different time zones in many different regions. 
>> To improve formatting and parsing reliability, the short forms are only used 
>> in a locale if the "cu" (commonly used) flag is set for the locale. 
>> Otherwise, only the long forms are used (for both formatting and parsing).
>> 
>> For the "en" locale (= "en_US"), the cu flag is set for metazones such as 
>> Alaska, America_Central, America_Eastern, America_Mountain, America_Pacific, 
>> Atlantic, Hawaii_Aleutian, and GMT. It is *not* set for Europe_Central.
>> 
>> However, for the "en_GB" locale, the cu flag *is* set for Europe_Central.
>> 
>> So a formatter set for short timezone style "z" or "zzz" and locale "en" or 
>> "en_US" will not parse "CEST" or "CET", but if the locale is instead set to 
>> "en_GB" it *will* parse those. The "GMT" style will be parsed by all.
>> 
>> ...
> 
> Thanks; I suspected that something like this might be the case. But the 
> result, as I pointed out in my bug report (10447767), is that you can't 
> round-trip the abbreviations that the system itself gives you:
> 
>NSDictionary* d = (NSDictionary*)CFTimeZoneCopyAbbreviationDictionary();
>for (NSString* aZone in d.keyEnumerator)
>NSLog(@"%@ %@", aZone, [dateFormatter dateFromString:
>   [NSString stringWithFormat:@"2011-11-15 06:50:59.735 %@", aZone]]);
> 
> These are *your* abbreviations (by "you" I mean the system) that aren't 
> working. If they aren't going to work why are you giving them to me? Surely 
> there should be some call that provides me with a list of *legal* 
> abbreviations. m.



Yes, there is a disconnect here. The dictionary returned by 
CFTimeZoneCopyAbbreviationDictionary (and by +[NSTImeZone 
abbreviationDictionary]) is a standard internal mapping that does not depend on 
locale and is not built from the abbreviations used by ICU, and always maps 
ambiguous zone abbreviations to a particular zone name (as noted in the 
documentation) regardless of locale. This is not particularly useful for your 
purposes.

What would be more useful in this case is function that takes a locale 
parameter, and returns a dictionary of the abbreviation mappings that are 
meaningful in that locale. This would be built from the locale-specific 
abbreviations used by ICU. If you agree, please file an enhancement request.

- Peter E

___

Cocoa-dev mailing list (Cocoa-dev@lists.apple.com)

Please do not post admin requests or moderator comments to the list.
Contact the moderators at cocoa-dev-admins(at)lists.apple.com

Help/Unsubscribe/Update your Subscription:
http://lists.apple.com/mailman/options/cocoa-dev/archive%40mail-archive.com

This email sent to arch...@mail-archive.com


Re: NSDateFormatter not working on iOS 5.

2011-11-16 Thread Peter Edberg
The change in parsing of abbreviated time zone names in iOS 5.0 is a result of 
an intentional change in the open-source ICU 4.8 library (and the open-source 
CLDR 2.0 data that it uses), a modified version of which is used to implement 
some of the NSDateFormatter functionality.

The issue is this: With the *short* timezone formats as specified by z (=zzz) 
or v (=vvv), there can be a lot of ambiguity. For example, "ET" for Eastern 
Time" could apply to different time zones in many different regions. To improve 
formatting and parsing reliability, the short forms are only used in a locale 
if the "cu" (commonly used) flag is set for the locale. Otherwise, only the 
long forms are used (for both formatting and parsing).

For the "en" locale (= "en_US"), the cu flag is set for metazones such as 
Alaska, America_Central, America_Eastern, America_Mountain, America_Pacific, 
Atlantic, Hawaii_Aleutian, and GMT. It is *not* set for Europe_Central.

However, for the "en_GB" locale, the cu flag *is* set for Europe_Central.

So a formatter set for short timezone style "z" or "zzz" and locale "en" or 
"en_US" will not parse "CEST" or "CET", but if the locale is instead set to 
"en_GB" it *will* parse those. The "GMT" style will be parsed by all.

If the formatter is set for the long timezone style "", and the locale is 
any of "en", "en_US", or "en_GB", then any of the following will be parsed, 
because they are unambiguous:
"Pacific Daylight Time"
"Central European Summer Time"
"Central European Time"

Hope this helps.

- Peter Edberg


> Date: Wed, 16 Nov 2011 16:29:41 +0800
> From: Kin Mak 
> Subject: Re: NSDateFormatter not working on iOS 5.
> To: Matt Neuburg 
> Cc: cocoa-dev@lists.apple.com
> 
> Matt,
> 
> The result differs not only on simulator, but also on iphones running 4.3 and 
> 5.0.
> I have also found out that ONLY some of the known time zones work fine on 
> 5.0. e.g. PDT, PST, GMT ...etc. However, all time zones work fine on iOS 4.3.
> In fact, I have already reported this as a bug to apple.
> 
> Thanks and Regards
> Kin
> 
> On Nov 15, 2011, at 11:11 PM, Matt Neuburg wrote:
> 
>> By the way, I can readily confirm that the results differ on the simulator 
>> for 4.3 vs. 5.0. m.
>> 
>>> On Fri, 11 Nov 2011 16:13:49 +0800, Kin Mak  said:
>>>> The following code used to work fine prior to iOS 5. The dateFromString 
>>>> method seems to stop working on iOS 5 and always returns null. I suspect 
>>>> this is a bug introduced in iOS 5.0. Have  anyone encountered the same 
>>>> issue? Or do I miss something here?
>>>> 
>>>>NSDateFormatter *dateFormatter = [[NSDateFormatter alloc] init];
>>>>[dateFormatter setDateFormat:@"-MM-dd HH:mm:ss.SSS zzz"];
>>>>
>>>>.
>>>>//E.g currentString = @"2011-11-11 11:00:00.000 CET";   
>>>>NSDate *date = [dateFormatter dateFromString:currentString];
>>> 
>>> It works for me if I substitute "PST" for your "CET" - could the "CET" be 
>>> the problem?
>>> 
>>> I guess what I would do is start by testing whether the date formatter can 
>>> round-trip its own output, like this:
>>> 
>>>  NSDateFormatter *dateFormatter = [[NSDateFormatter alloc] init];
>>>  [dateFormatter setDateFormat:@"-MM-dd HH:mm:ss.SSS zzz"];
>>>  NSString* output = [dateFormatter stringFromDate:[NSDate date]];
>>>  NSLog(@"%@", output);
>>> 
>>>  NSDate* date = [dateFormatter dateFromString: output];
>>>  NSLog(@"%@", date);
>>> 
>>> It can on my machine (PST). If it can't on your machine, that sounds like a 
>>> bug.
>>> 
>>> Also, do try -[dateFormatter getObjectValue:forString:errorDescription:]; 
>>> It is definitely throwing an error (not very helpful, "The operation 
>>> couldn't be completed") for your string on my machine.
>>> 

___

Cocoa-dev mailing list (Cocoa-dev@lists.apple.com)

Please do not post admin requests or moderator comments to the list.
Contact the moderators at cocoa-dev-admins(at)lists.apple.com

Help/Unsubscribe/Update your Subscription:
http://lists.apple.com/mailman/options/cocoa-dev/archive%40mail-archive.com

This email sent to arch...@mail-archive.com


Re: Is NSDateFormatter thread-safe?

2011-10-31 Thread Peter Edberg

On Oct 31, 2011, at 5:30 PM, cocoa-dev-requ...@lists.apple.com wrote:
> Date: Mon, 31 Oct 2011 17:27:10 -0700
> From: Jens Alfke 
> Subject: Is NSDateFormatter thread-safe?
> 
> I have some code that uses a shared NSDateFormatter instance to convert dates 
> to/from ISO-8601 format. This code is now being invoked on multiple threads, 
> and I‚m getting intermittent malloc warnings about double frees, and one 
> actual crash.
> My assumption was that an NSDateFormatter should be thread-safe as long as I 
> treat it as read-only (not changing any attributes) but this makes me think 
> I‚m wrong.

Short answer: NSDateFormatter is not thread-safe now even for that usage, but 
it will be more thread-safe in the future.

http://developer.apple.com/library/mac/#documentation/Cocoa/Conceptual/Multithreading/ThreadSafetySummary/ThreadSafetySummary.html
does list NSDateFormatter in the thread-unsafe classes, about which it says 
"The following classes … are generally not thread-safe. In most cases, you can 
use these classes from any thread as long as you use them from only one thread 
at a time."

In particular, the ICU functions udat_format and udat_parse, which implement 
some of the NSDateFormatter functionality, lie when they claim their 
UDateFormat * parameter is const. In fact they modify the Calendar object owned 
by the UDateFormat object, setting it to the date to format or to the parsed 
date. There is an ICU ticket to fix this which will make these functions more 
thread-safe: http://bugs.icu-project.org/trac/ticket/8844

- Peter E (who owns that ICU ticket)

___

Cocoa-dev mailing list (Cocoa-dev@lists.apple.com)

Please do not post admin requests or moderator comments to the list.
Contact the moderators at cocoa-dev-admins(at)lists.apple.com

Help/Unsubscribe/Update your Subscription:
http://lists.apple.com/mailman/options/cocoa-dev/archive%40mail-archive.com

This email sent to arch...@mail-archive.com


Re: How to count composed characters in NSString?

2008-09-29 Thread Peter Edberg


On Sep 29, 2008, at 9:27 PM, David Niemeijer wrote:


Hi Douglas and Peter,

On Sep 29, 2008, at 6:39 PM, Douglas Davidson wrote:

On Sep 28, 2008, at 11:17 AM, David Niemeijer wrote:

I need to be able to display the number of characters to the user  
in a way that makes sense to them. If they see 3 I should report  
3. I also need it to cut-off certain input to the number of "real"  
characters and should not generate results that only make sense  
for a language like English where each 16 bits equals a single  
character.


What you are describing is the notion that Unicode sometimes refers  
to as a "user-perceived character", which in general can be  
somewhat ambiguous, since different users may have different  
perceptions, and since there are writing systems in which character  
boundaries are not at all similar to those in English.  To handle  
this sort of issue programmatically, Unicode defines what are known  
as "grapheme clusters", but there is not a single notion of  
grapheme cluster; there are several such notions, depending on  
precisely what it is you want.


These issues are covered in detail in Unicode Standard Annex #29, , which gives a number of examples and some algorithms for  
determining grapheme cluster boundaries.  Grapheme clusters are  
similar to but not quite identical to composed character  
sequences.  For some purposes composed character sequences may be  
sufficient; NSString gives prominence to the notion of composed  
character sequence, because that is the most important concept for  
arbitrary text processing, but if you are really interested in user- 
perceived characters you may wish to use something else.


Thanks for your clarification. It is indeed the "grapheme clusters"  
that I am after. I need to be able to do things such as capitalize  
the first letter of a string and in doing statistical text analysis  
determine the number of "characters" of a text string. This  
description from the URL you pointed at fits my use quite well:  
"Grapheme cluster boundaries are important for collation, regular  
expressions, UI interactions (such as mouse selection, arrow key  
movement, backspacing), segmentation for vertical text,  
identification of boundaries for first-letter styling, and counting  
“character” positions within text." Using glyphs in this case is not  
appropriate as in text analysis the text itself is not displayed,  
nor is using [aString length] because it just reports the number of  
UTF-16 code units. I realize there is no perfect approach, but I am  
just trying to do something that brings me closest to what a user  
would expect.


Peter confirmed earlier that  
CFStringGetRangeOfComposedCharactersAtIndex would be the way to go  
for me. But, if I read Douglas' comment then I am beginning to  
wonder whether this is the equivalent of UCFindTextBreak's   
kUCTextBreakCharMask and not of kUCTextBreakClusterMask. In the past  
I used to use UCFindTextBreak with kUCTextBreakClusterMask, but  
unlike NSString, UCFindTextBreak is not available on one of the  
platforms I need to support, so what would be the right way to get  
at the cluster breaks using the NSString API? (Please contact me off  
list if you need further clarification.)


Cheers,

david.



David,
CFStringGetRangeOfComposedCharactersAtIndex and -[NSString  
rangeOfComposedCharacterSequenceAtIndex:] are the modern replacements  
for UCFindTextBreak with kUCTextBreakClusterMask and indeed they now  
are closer to the original intent of kUCTextBreakClusterMask that the  
current implementation of kUCTextBreakClusterMask is (since  
UCFindTextBreak was converted to follow Unicode/ICU default text  
segmentation rules).


The modern functions treat all of the following as a cluster:
- A surrogate pair (of course, since it is a single character);
- A base character followed by a sequence of combining marks (whether  
or not this is something that would be composed under NFC);

- A Hangul syllable expressed as a sequence of conjoining jamo;
- An Indic consonant cluster such as consonant + virama + consonant +  
vowel matra. It is this latter cluster that is no longer treated as a  
single entity by  UCFindTextBreak with kUCTextBreakClusterMask.


-Peter



___

Cocoa-dev mailing list (Cocoa-dev@lists.apple.com)

Please do not post admin requests or moderator comments to the list.
Contact the moderators at cocoa-dev-admins(at)lists.apple.com

Help/Unsubscribe/Update your Subscription:
http://lists.apple.com/mailman/options/cocoa-dev/archive%40mail-archive.com

This email sent to [EMAIL PROTECTED]


Re: How to count composed characters in NSString?

2008-09-28 Thread Peter Edberg


On Sep 28, 2008, at 3:05 PM, Peter Edberg wrote:


David,
Check out CFStringGetRangeOfComposedCharactersAtIndex. It finds the  
kinds of text boundaries that I think you are interested in. You  
would just need to iterate over the string calling this for each  
iteration  to find the next boundary.


Apologies, I see now that your in your original post you already  
mentioned rangeOfComposedCharacterSequenceAtIndex. That would be  
preferred :-)

-Peter

___

Cocoa-dev mailing list (Cocoa-dev@lists.apple.com)

Please do not post admin requests or moderator comments to the list.
Contact the moderators at cocoa-dev-admins(at)lists.apple.com

Help/Unsubscribe/Update your Subscription:
http://lists.apple.com/mailman/options/cocoa-dev/archive%40mail-archive.com

This email sent to [EMAIL PROTECTED]


Re: How to count composed characters in NSString?

2008-09-28 Thread Peter Edberg


On Sep 28, 2008, at 12:02 PM, [EMAIL PROTECTED] wrote:


--

Message: 1
Date: Sun, 28 Sep 2008 20:17:26 +0200
From: David Niemeijer <[EMAIL PROTECTED]>
Subject: Re: How to count composed characters in NSString?
To: Cocoa-Dev List 
Message-ID: <[EMAIL PROTECTED]>
Content-Type: text/plain; charset=US-ASCII; format=flowed; delsp=yes

Michael,

On 28 sep 2008, at 14:41, Michael Gardner wrote:

Upon further investigation, I may be wrong. I based my assertion
upon Apple's NSString documentation ("Returns the number of Unicode
characters in the receiver"), and upon some quick tests I ran. But
this reply made me look into the issue in greater depth.

I re-did my tests more throughly, and it does appear that -length
returns the number of 16-bit words (code units), not the number of
Unicode characters (code points), in the string. If this is true, I
would call it a bug either in the code or in the documentation,
which David should submit to Apple.


i think the docs are clear. In the discussion section for "length" it
says: "The number returned includes the individual characters of
composed character sequences, so you cannot use this method to
determine if a string will be visible when printed or how long it will
appear."

I did file a bug (ID 6253075) as you suggested, because I think there
should be a simple API for this.


I apologize for the apparent misinformation in my previous, hasty
reply.


Well, I mad an error too. i suggested that on 10.5 the
CFStringTokenizer could be used, but only now noticed that it only
supports larger units (words and up). Thus there is no easy API to
count the number of characters in a way that surrogate pairs or other
"long" unicode characters are treated as a single character.



David,
Check out CFStringGetRangeOfComposedCharactersAtIndex. It finds the  
kinds of text boundaries that I think you are interested in. You would  
just need to iterate over the string calling this for each iteration   
to find the next boundary.


-Peter Edberg, Apple


___

Cocoa-dev mailing list (Cocoa-dev@lists.apple.com)

Please do not post admin requests or moderator comments to the list.
Contact the moderators at cocoa-dev-admins(at)lists.apple.com

Help/Unsubscribe/Update your Subscription:
http://lists.apple.com/mailman/options/cocoa-dev/archive%40mail-archive.com

This email sent to [EMAIL PROTECTED]


Re: Localization and plural rules revisited

2008-05-20 Thread Peter Edberg


On May 20, 2008, at 12:48 PM, Ricky Sharp wrote:



On May 20, 2008, at 12:08 PM, Peter Edberg wrote:

CLDR (Common Locale Data Repository) has some draft data on plural  
forms for various languages. See
<http://unicode.org/cldr/data/charts/supplemental/language_plural_rules.html 
> for specific rules

and
<http://unicode.org/draft/reports/tr35/tr35.html#Language_Plural_Rules 
> for background info.



These are by far the best references I've seen to date.  Thanks!   
Very interesting that Ukrainian has a special rule for fractional  
values vs. integers.



One more interesting case to note (it is not reflected in the CLDR  
rules yet, we are in the midst of enhancing the rule syntax to support  
this) is: For French, all real values of x for 0 <= x < 2 take the  
"one" form. So some decimal fractions in French (e.g. the ones  
represented in English as 0.3, 1.3) take the "one" form while others  
(e.g. 2.3, 3.3) take the "other" form.


-Peter Edberg


___

Cocoa-dev mailing list (Cocoa-dev@lists.apple.com)

Please do not post admin requests or moderator comments to the list.
Contact the moderators at cocoa-dev-admins(at)lists.apple.com

Help/Unsubscribe/Update your Subscription:
http://lists.apple.com/mailman/options/cocoa-dev/archive%40mail-archive.com

This email sent to [EMAIL PROTECTED]


Re: Localization and plural rules revisited

2008-05-20 Thread Peter Edberg
CLDR (Common Locale Data Repository) has some draft data on plural  
forms for various languages. See
<http://unicode.org/cldr/data/charts/supplemental/language_plural_rules.html 
> for specific rules

and
<http://unicode.org/draft/reports/tr35/ 
tr35.html#Language_Plural_Rules> for background info.


-Peter Edberg

On May 20, 2008, at 1:48 AM, [EMAIL PROTECTED] wrote:


--

Message: 1
Date: Mon, 19 May 2008 22:02:08 -0500
From: Ricky Sharp <[EMAIL PROTECTED]>
Subject: Localization and plural rules revisited

There was a very short thread on NSLocalizedString regarding plural
rules:

<http://lists.apple.com/archives/cocoa-dev/2007/Oct/msg01208.html>


My app is mainly localization ready, but I still need to modify some
problem code regarding plurals.  Specifically, I have code which takes
some quantity 'n' and pulls an appropriate localized string.

But, in English, there are only two plural rules and thus I currently
only have two separate strings.  While this exact code would work for
a few languages (e.g. Spanish), it would fail for languages with
different quantities of plural rules.

I looked at several of Apple's apps (iTunes, Address Book, Safari) and
there seems to be a different strategy regarding plural rules.  For
example, in English .strings, you have this:

1 song
%d songs

In Russian (which has three plural rules), they kind of "cheated" by
still only having two strings:

1 song
songs: %d


Is this really the common practice?  To basically force all languages
to just use two strings?


I thought of doing the following instead:

Figure out plural rules for given locale.  This rule theoretically
could be an entry in the .strings file itself that could be parsed to
denote a function.  Such a function would return the ordinal "plural
rule" value given some quantity "n".  The ordinal value would then get
appended to the base key.

So, in languages with two rules, you'd have something like this:

"%d song_0" = "%d song";
"%d song_1" = "%d songs";

In languages with 5 plural rules, you'd then have 5 keys ("%d
song_0" .. "%d song_4").

Has anyone else tackled this problem?

___
Ricky A. Sharp mailto:[EMAIL PROTECTED]
Instant Interactive(tm)   http://www.instantinteractive.com


\
___

Cocoa-dev mailing list (Cocoa-dev@lists.apple.com)

Please do not post admin requests or moderator comments to the list.
Contact the moderators at cocoa-dev-admins(at)lists.apple.com

Help/Unsubscribe/Update your Subscription:
http://lists.apple.com/mailman/options/cocoa-dev/archive%40mail-archive.com

This email sent to [EMAIL PROTECTED]


Re: Map key codes to characters

2008-02-23 Thread Peter Edberg

Christian,
The mapping from a sequence of key codes to a sequence of UniChars  
(UTF16 code units, for the modern APIs) can be complex and in general  
is a state machine in which the inputs include keycode, modifier  
state, dead key state, and physical keyboard ID, and the outputs are  
updated dead key state and a sequence of 0 or more characters. The API  
that performs this mapping of keycode sequences to character sequences  
is UCKeyTranslate (in the CarbonCore framework in the CoreServices  
umbrella).


Consider the following simple sequence of keycode-modifier  
combinations on a US key layout (assuming an ANSI or ISO physical  
keyboard layout and assuming that dead key state is correctly  
maintained from one UCKeyTranslate  call to the next):


kVK_ANSI_E (0x0E) + option => no character output
kVK_ANSI_E (0x0E) + shift => 1 UniChar output: 0x00C9
kVK_ANSI_E (0x0E) + option => no character output
kVK_ANSI_X (0x07) + no modifiers => 2 UniChars output: 0x00B4 0x0078

In general there is no simple mapping from a given UniChar back to a  
single keycode. Consider UniChar 0x00B4 above, or perhaps even 0x9053,  
which requires interaction with an input method to produce.


Peter Edberg


On Feb 21, 2008, at 1:50 PM, [EMAIL PROTECTED] wrote:


--

Message: 14
Date: Thu, 21 Feb 2008 22:48:15 +0100
From: [EMAIL PROTECTED] (Christian Schmitz)
Subject: Map key codes to characters

is there a modern API which I can use to map between keys

For example giving 12 and want to get a "Q" back and for providing  
"Q" I

get a 12 back.

Of course it would be nice to know the option keys.

Is that possible?

Currently I use iGetKeys, but that is failing for a lot of cases.


___

Cocoa-dev mailing list (Cocoa-dev@lists.apple.com)

Please do not post admin requests or moderator comments to the list.
Contact the moderators at cocoa-dev-admins(at)lists.apple.com

Help/Unsubscribe/Update your Subscription:
http://lists.apple.com/mailman/options/cocoa-dev/archive%40mail-archive.com

This email sent to [EMAIL PROTECTED]