On Thu, Sep 13, 2007 at 11:07:03AM +0200, Stephane Bortzmeyer wrote: > On Thu, Sep 13, 2007 at 12:23:33AM +0000, > Aaron Denney <[EMAIL PROTECTED]> wrote > a message of 76 lines which said: > > > the characters read and written should correspond to the native > > environment notions and encodings. These are, under Unix, > > determined by the locale system. > > Locales, while fine for things like the language of the error messages > or the format to use to display the time, are *not* a good solution > for things like file names and file contents.
I never claimed it was a good system, merely that it was the system. Yes, serious applications should use byte oriented I/O and explicitly manage character sets when necessary. STDIO in general and terminal interaction in particular should use the locale selected by the user. > Even on a single Unix machine (without networking), there are > *several* users. Using the locale to find out the charset used for a > file name won't work if these users use different locales. > > Same thing for file contents. The charset used must be marked in the > file (XML...) or in the metadata, somehow. For file system and network access, the justification is a bit more clouded, but the interfaces there _should not_ be character interfaces. Character interfaces are _lies_; Word8s are what actually get passed, and trying to treat them as unicode characters with any fixed mapping breaks. At best we get an extremely leaky abstraction. Filesystems are not uniform across systems, yet Haskell tries to present a uniform view that manages to capture exactly no existing system. File contents (almost) everywhere are streams of bytes (ignoring, say, old record-based OSes, palm databases, and mac resource forks etc.) Almost all file systems use a hierarchical directory system, but with significant differences. Under unixes the names are NUL-terminated bytestrings that can't contain slashes. New Macs and Windows have specific character encodings (UTF-8, and UTF-16, respectively). DOS, old Macs, and windows have multiple roots and various directory seperators and forbidden characters. Trying to specify some API that is usable for robust programs that work on any of these is hard. I'd actually have preferred that the standard didn't even try, and instead provided system-specific annexes. Then an external library that was freer to evolve could try to solve the problem of providing a uniform interface that would not defy platform expectations. -- Aaron Denney -><- _______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe