Re: [Q] UTF-8 stringByAddingPercentEscapesUsingEncoding:NSUTF8StringEncoding weirdness

2008-06-18 Thread Ken Thomases

On Jun 18, 2008, at 1:49 PM, JongAm Park wrote:

Can anyone tell me why the two different data source are displayed  
as same 자연, while what it contains are different?


I haven't looked into the specific character sequences in-depth, but  
I suspect the difference is in Normalization Forms.  Specifically,  
form C vs. D.


http://unicode.org/reports/tr15/

The idea is that the same character can be obtained from a single  
code point or by several combining code points.


In Cocoa, see -precomposedStringWithCanonicalMapping and - 
decomposedStringWithCanonicalMapping.


Cheers,
Ken___

Cocoa-dev mailing list (Cocoa-dev@lists.apple.com)

Please do not post admin requests or moderator comments to the list.
Contact the moderators at cocoa-dev-admins(at)lists.apple.com

Help/Unsubscribe/Update your Subscription:
http://lists.apple.com/mailman/options/cocoa-dev/archive%40mail-archive.com

This email sent to [EMAIL PROTECTED]


Re: [Q] UTF-8 stringByAddingPercentEscapesUsingEncoding:NSUTF8StringEncoding weirdness

2008-06-18 Thread Christopher Nebel

On Jun 18, 2008, at 12:24 PM, Ken Thomases wrote:


On Jun 18, 2008, at 1:49 PM, JongAm Park wrote:

Can anyone tell me why the two different data source are displayed  
as same 자연, while what it contains are different?


I haven't looked into the specific character sequences in-depth, but  
I suspect the difference is in Normalization Forms.  Specifically,  
form C vs. D.


http://unicode.org/reports/tr15/

The idea is that the same character can be obtained from a single  
code point or by several combining code points.


In Cocoa, see -precomposedStringWithCanonicalMapping and - 
decomposedStringWithCanonicalMapping.


Sure looks like it, based on the data.  EC 9E 90 is U+C790, 자; E1  
84 8C E1 85 A1 is U+110C ᄌ, U+1161 ᅡ, which is the decomposed  
version of the same thing.  -[NSString fileSystemRepresentation] may  
also be of use here, given that this is really a file path -- the  
normalization form used for file names is dictated by the file system.



--Chris Nebel
AppleScript Engineering

___

Cocoa-dev mailing list (Cocoa-dev@lists.apple.com)

Please do not post admin requests or moderator comments to the list.
Contact the moderators at cocoa-dev-admins(at)lists.apple.com

Help/Unsubscribe/Update your Subscription:
http://lists.apple.com/mailman/options/cocoa-dev/archive%40mail-archive.com

This email sent to [EMAIL PROTECTED]


Re: [Q] UTF-8 stringByAddingPercentEscapesUsingEncoding:NSUTF8StringEncoding weirdness

2008-06-18 Thread JongAm Park

Thank you very much for the information.

I even didn't think about the normalization. Wow.. it is quite complicated.
I tried the 4 methods, 
-precomposedStringWith*[Canonical/Compatibility]*Mapping and 
-decomposedStringWith*[Canonical/Compatibility]*Mapping.


The result was that [NSString UTF8String] returns precomposed version, 
while the one used in the FCP was decomposed.


Thank you again.

Ken Thomases wrote:

On Jun 18, 2008, at 1:49 PM, JongAm Park wrote:

Can anyone tell me why the two different data source are displayed as 
same 자연, while what it contains are different?


I haven't looked into the specific character sequences in-depth, but I 
suspect the difference is in Normalization Forms.  Specifically, form 
C vs. D.


http://unicode.org/reports/tr15/

The idea is that the same character can be obtained from a single code 
point or by several combining code points.


In Cocoa, see -precomposedStringWithCanonicalMapping and 
-decomposedStringWithCanonicalMapping.


Cheers,
Ken


___

Cocoa-dev mailing list (Cocoa-dev@lists.apple.com)

Please do not post admin requests or moderator comments to the list.
Contact the moderators at cocoa-dev-admins(at)lists.apple.com

Help/Unsubscribe/Update your Subscription:
http://lists.apple.com/mailman/options/cocoa-dev/archive%40mail-archive.com

This email sent to [EMAIL PROTECTED]


Re: [Q] UTF-8 stringByAddingPercentEscapesUsingEncoding:NSUTF8StringEncoding weirdness

2008-06-18 Thread Ken Thomases

On Jun 18, 2008, at 3:47 PM, JongAm Park wrote:


Thank you very much for the information.


You're welcome.


I even didn't think about the normalization. Wow.. it is quite  
complicated.
I tried the 4 methods, -precomposedStringWith[Canonical/ 
Compatibility]Mapping and -decomposedStringWith[Canonical/ 
Compatibility]Mapping.


The result was that [NSString UTF8String] returns precomposed  
version


That's not quite accurate.  Any given string will be in precomposed  
or decomposed form (or it might not be normalized to either form, and  
have a mix).  Whatever form that string is in, -UTF8String will  
maintain it.  So, -UTF8String doesn't necessarily return  
precomposed form, it just so happens that the string you got was  
already in precomposed form.




, while the one used in the FCP was decomposed.


The low-level file-system APIs on Mac OS X use what Apple calls file- 
system representation, which is mostly decomposed (NFD) with some  
specific exceptions.  So, any time you obtain a file name from the  
file-system -- by enumerating a directory or from an NSOpenPanel, for  
example -- it's likely to be mostly decomposed.  This is true even if  
the name originally used to create the file was passed in precomposed  
form.


If you want the string in a specific normalization form for some  
reason, you need to transform it using the above methods.  Don't rely  
on file-system representation being in any particular form.  You  
can compare strings without regard for normalization form using one  
of the -compare:... methods and _not_ specifying NSLiteralSearch.   
Note that isEqual: and isEqualToString: _do_ specify NSLiteralSearch  
(or the equivalent) and so can report NO for two strings which  
display identically.


Cheers,
Ken
___

Cocoa-dev mailing list (Cocoa-dev@lists.apple.com)

Please do not post admin requests or moderator comments to the list.
Contact the moderators at cocoa-dev-admins(at)lists.apple.com

Help/Unsubscribe/Update your Subscription:
http://lists.apple.com/mailman/options/cocoa-dev/archive%40mail-archive.com

This email sent to [EMAIL PROTECTED]


Re: [Q] UTF-8, stringByAddingPercentEscapesUsingEncoding:NSUTF8StringEncoding, weirdness

2008-06-18 Thread JongAm Park

Thank you for the additional information.

Interestingly I found a similar method in NSFileManager, 
fileSystemRepresentationWithPath.
So, is there any document on which file system uses which representation?

Also, is there any reason why a program, like the FCP, prefers the decomposed 
string over precomposed string?
Will there be a problem, if a program expect decomposed string and its client 
program sends precomposed one to the server?
(Well.. it depends on implementation of the programs.. )

Sure looks like it, based on the data.  EC 9E 90 is U+C790, �; E1  
84 8C E1 85 A1 is U+110C ᄌ, U+1161 ᅡ, which is the decomposed  
version of the same thing.  -[NSString fileSystemRepresentation] may

also be of use here, given that this is really a file path -- the
normalization form used for file names is dictated by the file system.

--Chris Nebel
AppleScript Engineering


___

Cocoa-dev mailing list (Cocoa-dev@lists.apple.com)

Please do not post admin requests or moderator comments to the list.
Contact the moderators at cocoa-dev-admins(at)lists.apple.com

Help/Unsubscribe/Update your Subscription:
http://lists.apple.com/mailman/options/cocoa-dev/archive%40mail-archive.com

This email sent to [EMAIL PROTECTED]


Re: [Q] UTF-8 stringByAddingPercentEscapesUsingEncoding:NSUTF8StringEncoding weirdness

2008-06-18 Thread JongAm Park


I even didn't think about the normalization. Wow.. it is quite 
complicated.
I tried the 4 methods, 
-precomposedStringWith[Canonical/Compatibility]Mapping and 
-decomposedStringWith[Canonical/Compatibility]Mapping.


The result was that [NSString UTF8String] returns precomposed version


That's not quite accurate.  Any given string will be in precomposed or 
decomposed form (or it might not be normalized to either form, and 
have a mix).  Whatever form that string is in, -UTF8String will 
maintain it.  So, -UTF8String doesn't necessarily return precomposed 
form, it just so happens that the string you got was already in 
precomposed form.



You are right. It depends on an original string. The NSString is quite 
smart...



, while the one used in the FCP was decomposed.


The low-level file-system APIs on Mac OS X use what Apple calls 
file-system representation, which is mostly decomposed (NFD) with 
some specific exceptions.  So, any time you obtain a file name from 
the file-system -- by enumerating a directory or from an NSOpenPanel, 
for example -- it's likely to be mostly decomposed.  This is true even 
if the name originally used to create the file was passed in 
precomposed form.


If you want the string in a specific normalization form for some 
reason, you need to transform it using the above methods.  Don't rely 
on file-system representation being in any particular form.  You can 
compare strings without regard for normalization form using one of the 
-compare:... methods and _not_ specifying NSLiteralSearch.  Note that 
isEqual: and isEqualToString: _do_ specify NSLiteralSearch (or the 
equivalent) and so can report NO for two strings which display 
identically.


Cheers,
Ken

I tested with the compare: method. It could return Same when a 
decomposed string is compared with a composed string.
So, when Unicode is to be handled, it would be safer if the compare: 
function is used instead of isEqual.


( NSString even provides comparison with localized strings. I'm 
impressed!!! )


Thank you for the good information. Although I have used the NSString, I 
didn't know what those methods really meant. But now, I opened my eyes!!!

___

Cocoa-dev mailing list (Cocoa-dev@lists.apple.com)

Please do not post admin requests or moderator comments to the list.
Contact the moderators at cocoa-dev-admins(at)lists.apple.com

Help/Unsubscribe/Update your Subscription:
http://lists.apple.com/mailman/options/cocoa-dev/archive%40mail-archive.com

This email sent to [EMAIL PROTECTED]


Re: [Q] UTF-8, stringByAddingPercentEscapesUsingEncoding:NSUTF8StringEncoding, weirdness

2008-06-18 Thread Michael Ash
On Wed, Jun 18, 2008 at 5:07 PM, JongAm Park
[EMAIL PROTECTED] wrote:
 Thank you for the additional information.

 Interestingly I found a similar method in NSFileManager,
 fileSystemRepresentationWithPath.
 So, is there any document on which file system uses which representation?

Filesystem representation is a property of the OS, not of an
individual filesystem. It is not the representation of one filesystem,
but rather the representation used by the filesystem as a whole, as
opposed to pieces of the system unrelated to files.

The NSString and NSFileManager methods produce identical results as
far as I know, so use whichever one you like better.

 Also, is there any reason why a program, like the FCP, prefers the
 decomposed string over precomposed string?
 Will there be a problem, if a program expect decomposed string and its
 client program sends precomposed one to the server?
 (Well.. it depends on implementation of the programs.. )

A properly written program will accept any equivalent form of a string
as being the same as any other, unless it specifically says that it
requires a particular form. That said, many programs are buggy. The
Mac OS X kernel was a great example of this. Up to, I think, 10.2, the
kernel didn't properly convert all unicode sequences to the preferred
filesystem form, leading to buggy behavior if you gave it sequences it
didn't like. On more recent versions you can use any valid UTF-8 and
the kernel will convert it to the internal format that it requires.

In conclusion, you can probably send any form, but if you want to be
really careful, figure out what form the remote program prefers and
convert to that form before sending.

Mike
___

Cocoa-dev mailing list (Cocoa-dev@lists.apple.com)

Please do not post admin requests or moderator comments to the list.
Contact the moderators at cocoa-dev-admins(at)lists.apple.com

Help/Unsubscribe/Update your Subscription:
http://lists.apple.com/mailman/options/cocoa-dev/archive%40mail-archive.com

This email sent to [EMAIL PROTECTED]