> From: David Kastrup <d...@gnu.org>
> Date: Mon, 30 Jan 2017 19:32:14 +0100
> Cc: guile-user@gnu.org
> 
> Emacs uses an UTF-8 based encoding internally: basically, valid UTF-8 is
> represented as itself, there is a number of coding points beyond the
> actual limit of UTF-8 that is used for non-Unicode character sets, and
> single bytes not properly belonging to the read encoding are represented
> with 0x00...0x7f, 0xc0 0x80 ... 0xc0 0xbf and 0xc1 0x80 ... 0xbf (the
> latter two ranges are "overlong" encodings of 0x00...0x7f and
> consequently also not valid utf-8).

One other crucial detail is that Emacs also has unibyte strings
(arrays of bytes), which are necessary during startup, when Emacs
doesn't yet know how to decode non-ASCII strings.  Without that, you
wouldn't be able to start Emacs in a directory whose name includes
non-ASCII characters, because it couldn't access files it needs to
read to set up some of its decoding machinery.

Reply via email to