Re: Problem with national characters in XHTML

Tomas Zerolo Fri, 30 Sep 2005 21:34:27 -0700

On Sat, Oct 01, 2005 at 01:02:31AM +0200, Lennart Borgman wrote:
> Piet van Oostrum wrote:
[...]
> >That is just the internal representation of the character in Emacs. It's
> >not important. What matters is what Emacs writes to your file. When you
> >write out utf-8 (for example by giving the command
[...]
> So you mean that at a - what should I call it? - "text semantic level" 
> the utf-8 char and the latin-1 char has the same meaning?


Yes. You put that nicely. The *character* (a dieresis) stays the same.
The *representation* (loosely referred to as `encoding') changes.

I said loosely, because on more complex things as utf-8 there are
actually two layers: the `character set', mapping each character to an
integer (aka `code point', which in this case would be UNICODE or
ISO-10646, which nowadays are equivalent), and the representation in a
file, which may be utf-8 (most common), ucs-16 or whatnot.

Now the advantage of utf-8: it is a variable-width encoding, and uses up
just one byte for one ASCII character (on ASCII it uses the same code
points). So you can interpret an ASCII file ``as-is'' as an utf-8 file.

For higher characters (the ones, for example with codes >127 in
iso-8859-1 (aka Latin1)), you need more than one byte in utf-8. AFAIK,
up to 6 bytes, but don't take that too seriously.

The disadvantage is: it is a variable-width encoding, so you have to
process a text sequentially, byte-for-byte to get the character
boundaries right (it's designed to re-synchronize gracefully, though).

If you want the whole story (on UNICODE, ISO10646, UTF8), see here:

  <http://www.cl.cam.ac.uk/~mgk25/unicode.html>

(very recommended). From the perspective of a web slave, see:

  <http://www.w3.org/TR/REC-html40/charset.html>

HTH
-- tomas

signature.asc
Description: Digital signature

_______________________________________________
Emacs-devel mailing list
[email protected]
http://lists.gnu.org/mailman/listinfo/emacs-devel

Re: Problem with national characters in XHTML

Reply via email to