On Aug 28, 2008, at 3:40 PM, Andrew Farmer wrote:

On 28 Aug 08, at 12:08, Ricky Sharp wrote:
Just to point this out, the sequence of ASCII may not be useful at all if the file is say Unicode. The actual bytes making up each char could be "ASCII" values themselves.

Unicode is a character set, not an encoding. I'm not sure about UTF-16 or other stranger encodings, but I do know that any UTF-8 character below 0x80 corresponds directly to a single ASCII character. This is a design feature of the encoding.


Yea, it wasn't clear what I wrote. I meant if the file _contains_ Unicode.

You're correct in the UTF-8 encoding preserves all ASCII-7 characters as-is. When you get into the UTF-16, UTF-32 variants, individual bytes are always in the range [0x00..0xFF]

So, CJK UNIFIED IDEOGRAPH-4142 stored in UTF-16BE (U+4142) will appear as "AB".


I know the OP mentioned this is for a hex editor. I just looked at what my copy of "0xED" does. It too reduces text to ASCII just how it was described by the OP. The app offers a detail view which shows the current selection of bytes as common data types. One type is a string which will be interpreted by the user's choice of an encoding.

___________________________________________________________
Ricky A. Sharp         mailto:[EMAIL PROTECTED]
Instant Interactive(tm)   http://www.instantinteractive.com



_______________________________________________

Cocoa-dev mailing list (Cocoa-dev@lists.apple.com)

Please do not post admin requests or moderator comments to the list.
Contact the moderators at cocoa-dev-admins(at)lists.apple.com

Help/Unsubscribe/Update your Subscription:
http://lists.apple.com/mailman/options/cocoa-dev/archive%40mail-archive.com

This email sent to [EMAIL PROTECTED]

Reply via email to