Hello:
My knowledge about encoding is very poor and you seem to know a lot abou
this. could you explain a bit more what you have said. I have made the
following:
This is the problematic sequence 0011-01101110-0010-01001101
(F3-6e-20-4d) if I follow the instructions that appaear in the question(What
is UTF-8?) in the UTf-8 fAQ i obtain the following
01110111010001101 instead 1EE80D 0111010001101(Have I made a
mistake?) Following the utf-16 encoding from my result all works well. so to
finalize who do you think that is the responsible for this strange situation
the client for saying that the doc is utf-8 or the parser.
Regards,
Mario.
From: Pim Blokland [EMAIL PROTECTED]
To: Unicode mailing list [EMAIL PROTECTED]
Subject: Re: Problems encoding the spanish o
Date: Mon, 17 Nov 2003 13:26:19 +0100
pepe pepe schreef:
We have the following sequence of characters ...ización Map..
that is
the same than ...ización Map... that after suffering some
transformations becomes to ...izaci#56186;56333;ap
AS you can see the two characters 56186 and 56333 seem to
represent this
sequences ón M. Any idea?.
Yes, your input text obviously gets flagged as being in UTF-8
format, even if it is Latin-1 (or any codepage that has a ó at index
243).
Not only that, but the process making the mistake of thinking it is
UTF-8 also makes the mistake of not generating an error for
encountering malformed byte sequences, AND of outputting the result
as two 16-bit numbers instead of one 21-bit number.
If you take the byte sequence (hex) F3 6E 20 4D and treat it as
UTF-8 and don't care it's not valid, this maps to the value
(hex)1EE80D. Again, not caring this is not a valid codepoint,
turning this into UTF-16 would yield U+DB7A U+DC0D, which is what
you got in your output.
Pim Blokland
_
Dale rienda suelta a tu tiempo libre. Encuentra mil ideas para exprimir tu
ocio con MSN Entretenimiento. http://entretenimiento.msn.es/