> On Apr 1, 2017, at 4:57 PM, Gerriet M. Denkmann <gerri...@icloud.com> wrote:
> 
> 
>> On 2 Apr 2017, at 06:33, Jens Alfke <j...@mooseyard.com> wrote:
>> 
>> 
>>> On Apr 1, 2017, at 11:58 AM, Gerriet M. Denkmann <gerri...@icloud.com> 
>>> wrote:
>>> 
>>> I think that the examples above show, that NSURL does indeed do something 
>>> about normalising Unicode strings.
>> 
>> That makes sense; I’d expect that one of the RFCs covering URLs describes 
>> normalization. Otherwise constructing URLs (for a REST API, say) could 
>> become quite ambiguous because you wouldn’t know which way to encode various 
>> Unicode characters.
>> 
>>> But my point is that NSURL gets the normalisation wrong in this case; or at 
>>> least that it is not very consistent in normalising strings.
>> 
>> Yes, it does seem wrong that you can have two filenames that are treated as 
>> distinct by the filesystem, but whose URL.path properties produce identical 
>> NSStrings.
> 
> Sorry, my explanation was not quite clear: these two filenames look 
> absolutely identical, but as a sequence of Unicode code points, they are not 
> (tone-mark and vowel are in different order).
> 
> What puzzles me is that consonant + THAI CHARACTER MAI EK + THAI CHARACTER 
> SARA UU gets normalised by NSURL to:  consonant + THAI CHARACTER SARA UU + 
> THAI CHARACTER MAI EK (note the different order), whereas consonant + THAI 
> CHARACTER MAI EK + THAI CHARACTER SARA II is left unchanged.
Garret,

This is the standard Unicode Normalization behavior. Each Unicode character is 
assigned the Unicode Combining Property, an integer value defining the 
canonical ordering of combining marks.

The Unicode Combining Property for THAI CHARACTER SARA UU is 103, and THAI 
CHARACTER MAI EK 107. So, MAI EK always comes after SARA UU in the canonical 
order.

On the other hand, THAI CHARACTER SARA II has the property value 0 which 
indicates the start of the reordering segment. That’s why the character is not 
reordered in respect to other Thai combining characters.

Aki

> 
> 
>> (I assume you’ve been following the recent thread here about potential 
>> Unicode problems with the APFS filesystem in iOS 10.3? It sounds like things 
>> might become even more confusing.)
> 
> Yes, indeed I have. That started me worrying and looking into these 
> normalisation issues.
> 
> 
> Kind regards,
> 
> Gerriet.
> 
> 
> _______________________________________________
> 
> Cocoa-dev mailing list (Cocoa-dev@lists.apple.com)
> 
> Please do not post admin requests or moderator comments to the list.
> Contact the moderators at cocoa-dev-admins(at)lists.apple.com
> 
> Help/Unsubscribe/Update your Subscription:
> https://lists.apple.com/mailman/options/cocoa-dev/aki%40apple.com
> 
> This email sent to a...@apple.com


_______________________________________________

Cocoa-dev mailing list (Cocoa-dev@lists.apple.com)

Please do not post admin requests or moderator comments to the list.
Contact the moderators at cocoa-dev-admins(at)lists.apple.com

Help/Unsubscribe/Update your Subscription:
https://lists.apple.com/mailman/options/cocoa-dev/archive%40mail-archive.com

This email sent to arch...@mail-archive.com

Reply via email to