On Apr 23, 2007, at 22:48 UTC, Norman Palardy wrote:

> It seems to be ISO Latin 1 data represented in 16 bits per character.
> The data is almost exclusively NULL (00) followed by an ISO Latin 1  
> code point.

Sounds like UCS-2 or UTF-16, in, erm, little-endian format.

> Using the Guess the Encoding mechanism in String Utils doesn't  
> suggest ISO Latin 1 or UCS 2 either.

I'm surprised -- it should guess UTF-16, but if you're running on a
Mac, it may be wrong-endian.  Note that there are (if I've uploaded the
latest!) two versions of GuessEncoding, one of which can properly
report such wrong-endian cases, and the other which ignores it.  I
think you'll also find a function to swap every two bytes to correct a
wrong-endian string.

HTH,
- Joe

--
Joe Strout -- [EMAIL PROTECTED]


_______________________________________________
Unsubscribe or switch delivery mode:
<http://www.realsoftware.com/support/listmanager/>

Search the archives:
<http://support.realsoftware.com/listarchives/lists.html>

Reply via email to