Re: To show or not to show french accents

Marcin Benke Fri, 19 Dec 2003 09:55:18 -0800

MR K P SCHUPKE wrote:

The problem is that if you are reading single bytes, 233 is not necessarily �.


Erm, Internationalisation is not my thin as such... but I can't help
commenting that from a systems point of view this is an utterly bad
sitiation to be in... I though Haskell used unicode? I thought in unicode
the id of a character was fixed irrespective of language. Where is
unicode support lacking?

Regards, Keean Schupke.

quoting from the latest version of Unicode standard:

"The Unicode Standard specifies a numeric value (code point) and a name for each of its characters.[...] Unicode provides for three encoding forms: a 32-bit form (UTF-32), a 16-bit form (UTF- 16), and an 8-bit form (UTF-8)."

Hence in Unicode proper, characters are encoded as numbers (or actually "code points"), not bytes. The byte-oriented encoding variant is UTF-8.

In UTF-8, however the byte "233" does not represent any character on its own, but can only occur as the first byte of a 3 byte sequence. OTOH, UTF-8 encodes characters in ASCII range in the same way as ASCII.

Regards,
   Marcin Benke


_______________________________________________
Glasgow-haskell-users mailing list
[EMAIL PROTECTED]
http://www.haskell.org/mailman/listinfo/glasgow-haskell-users

Re: To show or not to show french accents

Reply via email to