On Wed, 15 Jan 2014 23:00:18 +0200 Eli Zaretskii <e...@gnu.org> wrote: > > Date: Wed, 15 Jan 2014 19:50:51 +0000 > > From: Chris Vine <ch...@cvine.freeserve.co.uk> > > Cc: guile-user@gnu.org > > > > POSIX system calls are encoding agnostic. The filename is just a > > series of bytes terminating with a NUL character. All guile needs > > to know is what encoding the person creating the filesystem has > > adopted in naming files and which it needs to map to. > > This doesn't work well, because you cannot easily take apart and > construct file names in encoding-agnostic ways. For example, some > multibyte sequence in an arbitrary encoding could include the '/' or > '\' characters, so searching for directory separators could fail, > unless you use multibyte-aware string functions (which is a nuisance, > because these functions only support a single locale at a time). > > So I think using UTF-8 internally is a much better way.
I am not sure what you mean, as I am not talking about internal use. Guile uses IS0-5598-1 and UTF-32 internally for all its strings, which is fine. glib uses UTF-32 and UTF-8 internally for most purposes. It is the external representation which is in issue. This is just an encoding transformation for the library when looking up a file (be it guile, glib or anything else). As it happens (although this is beside the point) using a byte value or sequence in a filename which the operating system reserves as the '/' character, for a purpose other than designating a pathname, or a NUL character for designating anything other than end of filename, is not POSIX compliant and will not work on any operating system I know of, including windows. (As for POSIX, see SUS, Base Definitions, section 3.170 (Filename) and 3.267 (Pathname).) But as I say, that is irrelevant. Whatever the filesystem encoding happens to be, it happens to be. It might not be a narrow encoding at all. Chris