> From: David Kastrup <d...@gnu.org> > Date: Mon, 30 Jan 2017 19:32:14 +0100 > Cc: guile-user@gnu.org > > Emacs uses an UTF-8 based encoding internally: basically, valid UTF-8 is > represented as itself, there is a number of coding points beyond the > actual limit of UTF-8 that is used for non-Unicode character sets, and > single bytes not properly belonging to the read encoding are represented > with 0x00...0x7f, 0xc0 0x80 ... 0xc0 0xbf and 0xc1 0x80 ... 0xbf (the > latter two ranges are "overlong" encodings of 0x00...0x7f and > consequently also not valid utf-8).
One other crucial detail is that Emacs also has unibyte strings (arrays of bytes), which are necessary during startup, when Emacs doesn't yet know how to decode non-ASCII strings. Without that, you wouldn't be able to start Emacs in a directory whose name includes non-ASCII characters, because it couldn't access files it needs to read to set up some of its decoding machinery.