I don't have a problem with all the character encoding infrastructure
   and whatnot, but it makes sense logically in an Language API to
   provide access to the lowest level of IO that is possible, whether the
   more advanced operations themselves are implemented in Haskell or
   specially by the complier is an implementation issue and an
   optimization. the point is that if i wanted to write my own character
   handling code in Haskell there is no way to because the base IO
   primitives (of reading and writing bytes) are hidden below the current
   IO API and 'implementation defined' behavior.
   
   My proposal was meant to satisfy three different needs which are not
   currently met by the Haskell IO primitives:
     * program determinism (soundness): the goal is to write one program,
       guaranteed to produce the exact same byte output for a given byte
       stream input regardless of compiler/platform it is run on. if
       this is impossible for a given platform, the compilation should
       die with an error,
     * IO to/from externally defined byte formats: XDR encoded files,
       RIFF files, network protocols, odd character encodings, people
       need to be able to read and write these in a standard way to files
       as well as things like sockets and pipes.
     * there is no Byte type for people writing new libraries, each
       person who writes a library which works on Byte streams must come
       up with their own kludge, oftentimes this causes needless
       conflicting namespaces, and even more often it is not strictly
       portable. a common solution is to assume a Char is a byte, which
       is not true, the language specifies that a Char is a unicode
       encoded character, which arbitrary binary data is not, this leads
       to confusion and programming errors because the type information
       is lost, there is nothing to distinguish a raw UTF8 encoded byte
       stream from an actual Haskell string, people will be tempted to
       call things like isLower on the Chars which will deceptively work
       as long as they stick to ASCII and then people will be perplexed
       when their program mysteriously stops working in Japan. Also there
       are other obvious differences Bytes should be Integrals, Chars
       should not, On many platforms Chars will be 32 bits, this results
       in a 4 fold increase in space needed for evaluated byte streams.
       often you use Byte streams in areas which have nothing at all to
       do with character or string encoding, the use of String and [Char]
       in those cases would be confusing to new and experienced users. if
       a database API has a function lookup :: [Char] -> [Char] can it
       only work on strings? or arbitrary byte streams? what is the
       character encoding used in the database if i want to access it
       from C? the answers to all these questions are lost by using that
       definition as opposed to the utterly unambiguous lookup::[Byte] ->
       [Byte].
       
   without these properties it is almost impossible to write any 'real'
   application in Haskell, mainly because most programs need to do at
   least one interesting thing where interesting means interact with the
   outside world in a new/undefined manner, the programmer needs to know
   his Haskell program or library will work as expected across platforms.
   
   the solution needs not be the best or even the most efficient, it
   merely needs to solve these problems in a way that is not inconsistent
   with the current language APIs and definition and be not difficult to
   implement for any compiler writer. the reason is that it is unclear at
   this time what a completely efficient API would look like, different
   compilers have their own efficient IO extensions which people can use
   if they are that concerned, but it needs to be possible to write
   portable programs and libraries that need the above properties
   independent of the above. that is what i designed my API to be, simply
   implemented on top of the current compilers private binary APIs yet
   simple and modeled after the current Prelude functions so as to mesh
   well with existing codebase and mindshare.

                John
-- 
--------------------------------------------------------------
John Meacham   http://repetae.net/~john/   [EMAIL PROTECTED]
California Institute of Technology, Alum.  [EMAIL PROTECTED]
--------------------------------------------------------------

Reply via email to