Hello, all.

I found a very interesting and strange behaviour of the NSString:stringByAddingPercentEscapedUsingEncoding.

I got a UTF-8 string from a Final Cut Pro project file, which is exported as an XML.
There is a video clip named "자연", which means "Nature" in Korean.
And its pathurl is file://localhost/Users/young/Movies/%E1%84%8C%E1%85%A1%E1%84%8B%E1%85%A7%E1%86%AB.mov
The 자연 part is %E1%84%8C%E1%85%A1%E1%84%8B%E1%85%A7%E1%86%AB.
So, it is percent escaped string.

So, I tried getting a UTF8 version of "자연" by issuing either of :

1. NSString *anUTF16String = [NSString stringWithString:@"자연"];
NSString *anUTF8String = [anUTF16String UTF8String];

or
2. NSString *anUTF8String = [NSString stringWithUTF8String:"자연"];

And they returned same data.

And, I tried making a percent escaped string by calling :
NSString *anUTF8PercentEscapedString = [anUTF8String stringByAddingPercentEscapesUsingEncoding:NSUTF8StringEncoding];

And tried reverting back to original string by calling :
NSString *revertedUTF8String = [anUTF8PercentEscapedString stringByReplacingPercentEscapesUsingEncoding:NSUTF8StringEncoding];

This gave me the same original data to the one tried in either 1 or 2 above.
And I checked what data it contains by calling :

3. char *revertedCStringOne = (char *)[revertedUTF8String cStringUsingEncoding:NSUTF8StringEncoding];

It was : EC 9E 90 EC 97 B0

As I mentioned above, the pathurl string of FCP project looks different from the result 3.
So, I tried converting the Korean part of the pathurl by calling :

char test[] ={ 0xE1, 0x84, 0x8C, 0xE1, 0x85, 0xA1, 0xE1, 0x84, 0x8B, 0xE1, 0x85, 0xA7, 0xE1, 0x86, 0xAB, 0};
length = strlen( test );
for( i = 0; i < length; i++ )
{
NSLog(@"%X", test[i] );
}
printf("\n");

// 4. It prints the same "자연"
NSString *questionedString = [NSString stringWithUTF8String:test];
NSLog(@"Questioned String = %@", questionedString );

and.. when the questionedString is converted to a percent escaped string by calling : NSString *questionedPercentEscapedString = [questionedString stringByAddingPercentEscapesUsingEncoding:NSUTF8StringEncoding];
NSLog(@"%@", questionedPercentEscapedString);

It was same to the one in the FCP project pathurl, ie. %E1%84%8C

Can anyone tell me why the two different data source are displayed as same "자연", while what it contains are different? I would like to send an Apple event to the Final Cut Pro, but I'm not sure if it is OK to send the percent escaped one like 1 or 2, or the one in the FCP project. ( I don't know how to generate the one like in the FCP project XML file. )

I also tried a Java applet, http://www.profitcode.net/resources/tools/utf8_encoder_applet.html, and its result is same to the one tried at 1 or 2 above. It is different from the one in the FCP project.

I will appreciate any help.

Thank you.


P.S. My whole code is here, just in case.

-----------------------------------------------------------------------------------------------------------------------------------------------

NSString *anUTF16String = [NSString stringWithString:@"자연"];
//NSString *anUTF16PercentEscapedString = [anUTF16String stringByAddingPercentEscapesUsingEncoding:NSUTF16StringEncoding]; char *UTF16CString = (char *)[anUTF16String cStringUsingEncoding:NSUTF16StringEncoding];

// 1. Making an NSString object with a UTF8 encoding
NSString *anUTF8String = [NSString stringWithUTF8String:"자연"];
NSString *anUTF8PercentEscapedString = [anUTF8String stringByAddingPercentEscapesUsingEncoding:NSUTF8StringEncoding]; NSString *revertedUTF8String = [anUTF8PercentEscapedString stringByReplacingPercentEscapesUsingEncoding:NSUTF8StringEncoding]; char *revertedCStringOne = (char *)[revertedUTF8String cStringUsingEncoding:NSUTF8StringEncoding];

NSLog(@"Unicode 16 : %@", anUTF16String );
//NSLog(@"Unicode 16 Percent Escaped : %@", anUTF16PercentEscapedString );

NSLog(@"Unicode 8 : %@", anUTF8String );
NSLog(@"Unicode 8 Percent Escaped : %@", anUTF8PercentEscapedString );
NSLog(@"Reverted from Unicode 8 Percent Escaped : %@", revertedUTF8String );
NSLog(@"bytes : %s", revertedCStringOne );

// 2. The data : EC 9E 90 EC 97 B0
int length = strlen( revertedCStringOne );
int i;
for( i = 0; i < length; i++ )
{
NSLog(@"%X", revertedCStringOne[i] );
}
printf("\n");

// 3. Data from a Final Cut Pro XML project file which is same to "자연"
// This looks very different from what you can see from // 2.
char test[] ={ 0xE1, 0x84, 0x8C, 0xE1, 0x85, 0xA1, 0xE1, 0x84, 0x8B, 0xE1, 0x85, 0xA7, 0xE1, 0x86, 0xAB, 0};
length = strlen( test );
for( i = 0; i < length; i++ )
{
NSLog(@"%X", test[i] );
}
printf("\n");

// 4. It prints the same "자연"
NSString *questionedString = [NSString stringWithUTF8String:test];
NSLog(@"Questioned String = %@", questionedString );

// 5. Percent Escape representation of it is same to that of //3 not //2
NSString *questionedPercentEscapedString = [questionedString stringByAddingPercentEscapesUsingEncoding:NSUTF8StringEncoding];
NSLog(@"%@", questionedPercentEscapedString);





_______________________________________________

Cocoa-dev mailing list (Cocoa-dev@lists.apple.com)

Please do not post admin requests or moderator comments to the list.
Contact the moderators at cocoa-dev-admins(at)lists.apple.com

Help/Unsubscribe/Update your Subscription:
http://lists.apple.com/mailman/options/cocoa-dev/archive%40mail-archive.com

This email sent to [EMAIL PROTECTED]

Reply via email to