Re: Convert unicode string into ascii

Ricky Sharp Thu, 28 Aug 2008 13:58:29 -0700


On Aug 28, 2008, at 3:40 PM, Andrew Farmer wrote:

On 28 Aug 08, at 12:08, Ricky Sharp wrote:
Just to point this out, the sequence of ASCII may not be useful atall if the file is say Unicode. The actual bytes making up eachchar could be "ASCII" values themselves.
Unicode is a character set, not an encoding. I'm not sure aboutUTF-16 or other stranger encodings, but I do know that any UTF-8character below 0x80 corresponds directly to a single ASCIIcharacter. This is a design feature of the encoding.

Yea, it wasn't clear what I wrote. I meant if the file _contains_Unicode.

You're correct in the UTF-8 encoding preserves all ASCII-7 charactersas-is. When you get into the UTF-16, UTF-32 variants, individualbytes are always in the range [0x00..0xFF]

So, CJK UNIFIED IDEOGRAPH-4142 stored in UTF-16BE (U+4142) will appearas "AB".

I know the OP mentioned this is for a hex editor. I just looked atwhat my copy of "0xED" does. It too reduces text to ASCII just how itwas described by the OP. The app offers a detail view which shows thecurrent selection of bytes as common data types. One type is a stringwhich will be interpreted by the user's choice of an encoding.


___________________________________________________________
Ricky A. Sharp         mailto:[EMAIL PROTECTED]
Instant Interactive(tm)   http://www.instantinteractive.com



_______________________________________________

Cocoa-dev mailing list (Cocoa-dev@lists.apple.com)

Please do not post admin requests or moderator comments to the list.
Contact the moderators at cocoa-dev-admins(at)lists.apple.com

Help/Unsubscribe/Update your Subscription:
http://lists.apple.com/mailman/options/cocoa-dev/archive%40mail-archive.com

This email sent to [EMAIL PROTECTED]

Re: Convert unicode string into ascii

Reply via email to