Problems encoding the spanish o

2003-11-17 Thread pepe pepe
Hello:

 We have the following sequence of characters ...ización Map.. that is 
the same than ...izaci#243;n Map... that after suffering some 
transformations becomes to ...izaci#56186;56333;ap
AS you can see the two characters 56186 and 56333 seem to represent this 
sequences ón M. Any idea?.

Regards,
Mario.
_
Charla con tus amigos en línea mediante MSN Messenger. 
http://messenger.microsoft.com/es




Re: Problems encoding the spanish o

2003-11-17 Thread pepe pepe
Hello:

  My knowledge about encoding is very poor and you seem to know a lot abou 
this. could you explain a bit more what you have said. I have made the 
following:

This is the problematic sequence 0011-01101110-0010-01001101 
(F3-6e-20-4d) if I follow the instructions that appaear in the question(What 
is UTF-8?) in the UTf-8 fAQ i obtain the following
01110111010001101 instead 1EE80D 0111010001101(Have I made a 
mistake?) Following the utf-16 encoding from my result all works well. so to 
finalize who do you think that is the responsible for this strange situation 
the client for saying that the doc is utf-8 or the parser.

Regards,
Mario.


From: Pim Blokland [EMAIL PROTECTED]
To: Unicode mailing list [EMAIL PROTECTED]
Subject: Re: Problems  encoding the spanish o
Date: Mon, 17 Nov 2003 13:26:19 +0100
pepe pepe schreef:

   We have the following sequence of characters ...ización Map..
that is
 the same than ...ización Map... that after suffering some
 transformations becomes to ...izaci#56186;56333;ap
 AS you can see the two characters 56186 and 56333 seem to
represent this
 sequences ón M. Any idea?.
Yes, your input text obviously gets flagged as being in UTF-8
format, even if it is Latin-1 (or any codepage that has a ó at index
243).
Not only that, but the process making the mistake of thinking it is
UTF-8 also makes the mistake of not generating an error for
encountering malformed byte sequences, AND of outputting the result
as two 16-bit numbers instead of one 21-bit number.
If you take the byte sequence (hex) F3 6E 20 4D and treat it as
UTF-8 and don't care it's not valid, this maps to the value
(hex)1EE80D. Again, not caring this is not a valid codepoint,
turning this into UTF-16 would yield U+DB7A U+DC0D, which is what
you got in your output.
Pim Blokland



_
Dale rienda suelta a tu tiempo libre. Encuentra mil ideas para exprimir tu 
ocio con MSN Entretenimiento. http://entretenimiento.msn.es/