Re: [Q] UTF-8 stringByAddingPercentEscapesUsingEncoding:NSUTF8StringEncoding weirdness
On Jun 18, 2008, at 1:49 PM, JongAm Park wrote: Can anyone tell me why the two different data source are displayed as same 자연, while what it contains are different? I haven't looked into the specific character sequences in-depth, but I suspect the difference is in Normalization Forms. Specifically, form C vs. D. http://unicode.org/reports/tr15/ The idea is that the same character can be obtained from a single code point or by several combining code points. In Cocoa, see -precomposedStringWithCanonicalMapping and - decomposedStringWithCanonicalMapping. Cheers, Ken___ Cocoa-dev mailing list (Cocoa-dev@lists.apple.com) Please do not post admin requests or moderator comments to the list. Contact the moderators at cocoa-dev-admins(at)lists.apple.com Help/Unsubscribe/Update your Subscription: http://lists.apple.com/mailman/options/cocoa-dev/archive%40mail-archive.com This email sent to [EMAIL PROTECTED]
Re: [Q] UTF-8 stringByAddingPercentEscapesUsingEncoding:NSUTF8StringEncoding weirdness
On Jun 18, 2008, at 12:24 PM, Ken Thomases wrote: On Jun 18, 2008, at 1:49 PM, JongAm Park wrote: Can anyone tell me why the two different data source are displayed as same 자연, while what it contains are different? I haven't looked into the specific character sequences in-depth, but I suspect the difference is in Normalization Forms. Specifically, form C vs. D. http://unicode.org/reports/tr15/ The idea is that the same character can be obtained from a single code point or by several combining code points. In Cocoa, see -precomposedStringWithCanonicalMapping and - decomposedStringWithCanonicalMapping. Sure looks like it, based on the data. EC 9E 90 is U+C790, 자; E1 84 8C E1 85 A1 is U+110C ᄌ, U+1161 ᅡ, which is the decomposed version of the same thing. -[NSString fileSystemRepresentation] may also be of use here, given that this is really a file path -- the normalization form used for file names is dictated by the file system. --Chris Nebel AppleScript Engineering ___ Cocoa-dev mailing list (Cocoa-dev@lists.apple.com) Please do not post admin requests or moderator comments to the list. Contact the moderators at cocoa-dev-admins(at)lists.apple.com Help/Unsubscribe/Update your Subscription: http://lists.apple.com/mailman/options/cocoa-dev/archive%40mail-archive.com This email sent to [EMAIL PROTECTED]
Re: [Q] UTF-8 stringByAddingPercentEscapesUsingEncoding:NSUTF8StringEncoding weirdness
Thank you very much for the information. I even didn't think about the normalization. Wow.. it is quite complicated. I tried the 4 methods, -precomposedStringWith*[Canonical/Compatibility]*Mapping and -decomposedStringWith*[Canonical/Compatibility]*Mapping. The result was that [NSString UTF8String] returns precomposed version, while the one used in the FCP was decomposed. Thank you again. Ken Thomases wrote: On Jun 18, 2008, at 1:49 PM, JongAm Park wrote: Can anyone tell me why the two different data source are displayed as same 자연, while what it contains are different? I haven't looked into the specific character sequences in-depth, but I suspect the difference is in Normalization Forms. Specifically, form C vs. D. http://unicode.org/reports/tr15/ The idea is that the same character can be obtained from a single code point or by several combining code points. In Cocoa, see -precomposedStringWithCanonicalMapping and -decomposedStringWithCanonicalMapping. Cheers, Ken ___ Cocoa-dev mailing list (Cocoa-dev@lists.apple.com) Please do not post admin requests or moderator comments to the list. Contact the moderators at cocoa-dev-admins(at)lists.apple.com Help/Unsubscribe/Update your Subscription: http://lists.apple.com/mailman/options/cocoa-dev/archive%40mail-archive.com This email sent to [EMAIL PROTECTED]
Re: [Q] UTF-8 stringByAddingPercentEscapesUsingEncoding:NSUTF8StringEncoding weirdness
On Jun 18, 2008, at 3:47 PM, JongAm Park wrote: Thank you very much for the information. You're welcome. I even didn't think about the normalization. Wow.. it is quite complicated. I tried the 4 methods, -precomposedStringWith[Canonical/ Compatibility]Mapping and -decomposedStringWith[Canonical/ Compatibility]Mapping. The result was that [NSString UTF8String] returns precomposed version That's not quite accurate. Any given string will be in precomposed or decomposed form (or it might not be normalized to either form, and have a mix). Whatever form that string is in, -UTF8String will maintain it. So, -UTF8String doesn't necessarily return precomposed form, it just so happens that the string you got was already in precomposed form. , while the one used in the FCP was decomposed. The low-level file-system APIs on Mac OS X use what Apple calls file- system representation, which is mostly decomposed (NFD) with some specific exceptions. So, any time you obtain a file name from the file-system -- by enumerating a directory or from an NSOpenPanel, for example -- it's likely to be mostly decomposed. This is true even if the name originally used to create the file was passed in precomposed form. If you want the string in a specific normalization form for some reason, you need to transform it using the above methods. Don't rely on file-system representation being in any particular form. You can compare strings without regard for normalization form using one of the -compare:... methods and _not_ specifying NSLiteralSearch. Note that isEqual: and isEqualToString: _do_ specify NSLiteralSearch (or the equivalent) and so can report NO for two strings which display identically. Cheers, Ken ___ Cocoa-dev mailing list (Cocoa-dev@lists.apple.com) Please do not post admin requests or moderator comments to the list. Contact the moderators at cocoa-dev-admins(at)lists.apple.com Help/Unsubscribe/Update your Subscription: http://lists.apple.com/mailman/options/cocoa-dev/archive%40mail-archive.com This email sent to [EMAIL PROTECTED]
Re: [Q] UTF-8, stringByAddingPercentEscapesUsingEncoding:NSUTF8StringEncoding, weirdness
Thank you for the additional information. Interestingly I found a similar method in NSFileManager, fileSystemRepresentationWithPath. So, is there any document on which file system uses which representation? Also, is there any reason why a program, like the FCP, prefers the decomposed string over precomposed string? Will there be a problem, if a program expect decomposed string and its client program sends precomposed one to the server? (Well.. it depends on implementation of the programs.. ) Sure looks like it, based on the data. EC 9E 90 is U+C790, �; E1 84 8C E1 85 A1 is U+110C ᄌ, U+1161 ᅡ, which is the decomposed version of the same thing. -[NSString fileSystemRepresentation] may also be of use here, given that this is really a file path -- the normalization form used for file names is dictated by the file system. --Chris Nebel AppleScript Engineering ___ Cocoa-dev mailing list (Cocoa-dev@lists.apple.com) Please do not post admin requests or moderator comments to the list. Contact the moderators at cocoa-dev-admins(at)lists.apple.com Help/Unsubscribe/Update your Subscription: http://lists.apple.com/mailman/options/cocoa-dev/archive%40mail-archive.com This email sent to [EMAIL PROTECTED]
Re: [Q] UTF-8 stringByAddingPercentEscapesUsingEncoding:NSUTF8StringEncoding weirdness
I even didn't think about the normalization. Wow.. it is quite complicated. I tried the 4 methods, -precomposedStringWith[Canonical/Compatibility]Mapping and -decomposedStringWith[Canonical/Compatibility]Mapping. The result was that [NSString UTF8String] returns precomposed version That's not quite accurate. Any given string will be in precomposed or decomposed form (or it might not be normalized to either form, and have a mix). Whatever form that string is in, -UTF8String will maintain it. So, -UTF8String doesn't necessarily return precomposed form, it just so happens that the string you got was already in precomposed form. You are right. It depends on an original string. The NSString is quite smart... , while the one used in the FCP was decomposed. The low-level file-system APIs on Mac OS X use what Apple calls file-system representation, which is mostly decomposed (NFD) with some specific exceptions. So, any time you obtain a file name from the file-system -- by enumerating a directory or from an NSOpenPanel, for example -- it's likely to be mostly decomposed. This is true even if the name originally used to create the file was passed in precomposed form. If you want the string in a specific normalization form for some reason, you need to transform it using the above methods. Don't rely on file-system representation being in any particular form. You can compare strings without regard for normalization form using one of the -compare:... methods and _not_ specifying NSLiteralSearch. Note that isEqual: and isEqualToString: _do_ specify NSLiteralSearch (or the equivalent) and so can report NO for two strings which display identically. Cheers, Ken I tested with the compare: method. It could return Same when a decomposed string is compared with a composed string. So, when Unicode is to be handled, it would be safer if the compare: function is used instead of isEqual. ( NSString even provides comparison with localized strings. I'm impressed!!! ) Thank you for the good information. Although I have used the NSString, I didn't know what those methods really meant. But now, I opened my eyes!!! ___ Cocoa-dev mailing list (Cocoa-dev@lists.apple.com) Please do not post admin requests or moderator comments to the list. Contact the moderators at cocoa-dev-admins(at)lists.apple.com Help/Unsubscribe/Update your Subscription: http://lists.apple.com/mailman/options/cocoa-dev/archive%40mail-archive.com This email sent to [EMAIL PROTECTED]
Re: [Q] UTF-8, stringByAddingPercentEscapesUsingEncoding:NSUTF8StringEncoding, weirdness
On Wed, Jun 18, 2008 at 5:07 PM, JongAm Park [EMAIL PROTECTED] wrote: Thank you for the additional information. Interestingly I found a similar method in NSFileManager, fileSystemRepresentationWithPath. So, is there any document on which file system uses which representation? Filesystem representation is a property of the OS, not of an individual filesystem. It is not the representation of one filesystem, but rather the representation used by the filesystem as a whole, as opposed to pieces of the system unrelated to files. The NSString and NSFileManager methods produce identical results as far as I know, so use whichever one you like better. Also, is there any reason why a program, like the FCP, prefers the decomposed string over precomposed string? Will there be a problem, if a program expect decomposed string and its client program sends precomposed one to the server? (Well.. it depends on implementation of the programs.. ) A properly written program will accept any equivalent form of a string as being the same as any other, unless it specifically says that it requires a particular form. That said, many programs are buggy. The Mac OS X kernel was a great example of this. Up to, I think, 10.2, the kernel didn't properly convert all unicode sequences to the preferred filesystem form, leading to buggy behavior if you gave it sequences it didn't like. On more recent versions you can use any valid UTF-8 and the kernel will convert it to the internal format that it requires. In conclusion, you can probably send any form, but if you want to be really careful, figure out what form the remote program prefers and convert to that form before sending. Mike ___ Cocoa-dev mailing list (Cocoa-dev@lists.apple.com) Please do not post admin requests or moderator comments to the list. Contact the moderators at cocoa-dev-admins(at)lists.apple.com Help/Unsubscribe/Update your Subscription: http://lists.apple.com/mailman/options/cocoa-dev/archive%40mail-archive.com This email sent to [EMAIL PROTECTED]