I don't have a problem with all the character encoding infrastructure
and whatnot, but it makes sense logically in an Language API to
provide access to the lowest level of IO that is possible, whether the
more advanced operations themselves are implemented in Haskell or
specially by the complier is an implementation issue and an
optimization. the point is that if i wanted to write my own character
handling code in Haskell there is no way to because the base IO
primitives (of reading and writing bytes) are hidden below the current
IO API and 'implementation defined' behavior.
My proposal was meant to satisfy three different needs which are not
currently met by the Haskell IO primitives:
* program determinism (soundness): the goal is to write one program,
guaranteed to produce the exact same byte output for a given byte
stream input regardless of compiler/platform it is run on. if
this is impossible for a given platform, the compilation should
die with an error,
* IO to/from externally defined byte formats: XDR encoded files,
RIFF files, network protocols, odd character encodings, people
need to be able to read and write these in a standard way to files
as well as things like sockets and pipes.
* there is no Byte type for people writing new libraries, each
person who writes a library which works on Byte streams must come
up with their own kludge, oftentimes this causes needless
conflicting namespaces, and even more often it is not strictly
portable. a common solution is to assume a Char is a byte, which
is not true, the language specifies that a Char is a unicode
encoded character, which arbitrary binary data is not, this leads
to confusion and programming errors because the type information
is lost, there is nothing to distinguish a raw UTF8 encoded byte
stream from an actual Haskell string, people will be tempted to
call things like isLower on the Chars which will deceptively work
as long as they stick to ASCII and then people will be perplexed
when their program mysteriously stops working in Japan. Also there
are other obvious differences Bytes should be Integrals, Chars
should not, On many platforms Chars will be 32 bits, this results
in a 4 fold increase in space needed for evaluated byte streams.
often you use Byte streams in areas which have nothing at all to
do with character or string encoding, the use of String and [Char]
in those cases would be confusing to new and experienced users. if
a database API has a function lookup :: [Char] -> [Char] can it
only work on strings? or arbitrary byte streams? what is the
character encoding used in the database if i want to access it
from C? the answers to all these questions are lost by using that
definition as opposed to the utterly unambiguous lookup::[Byte] ->
[Byte].
without these properties it is almost impossible to write any 'real'
application in Haskell, mainly because most programs need to do at
least one interesting thing where interesting means interact with the
outside world in a new/undefined manner, the programmer needs to know
his Haskell program or library will work as expected across platforms.
the solution needs not be the best or even the most efficient, it
merely needs to solve these problems in a way that is not inconsistent
with the current language APIs and definition and be not difficult to
implement for any compiler writer. the reason is that it is unclear at
this time what a completely efficient API would look like, different
compilers have their own efficient IO extensions which people can use
if they are that concerned, but it needs to be possible to write
portable programs and libraries that need the above properties
independent of the above. that is what i designed my API to be, simply
implemented on top of the current compilers private binary APIs yet
simple and modeled after the current Prelude functions so as to mesh
well with existing codebase and mindshare.
John
--
--------------------------------------------------------------
John Meacham http://repetae.net/~john/ [EMAIL PROTECTED]
California Institute of Technology, Alum. [EMAIL PROTECTED]
--------------------------------------------------------------