On 10/14/2010 22:24, Andrei Alexandrescu wrote: > On 10/14/10 21:22 CDT, Rainer Deyke wrote: >> Characters data must be encoded into bytes before it is written and >> decoded before it is read. The low-level OS functions only deal with >> bytes, not characters. > > I'm not so sure about that. For example, some code in std.stdio is > dedicated to supporting fwide(): > > http://www.opengroup.org/onlinepubs/000095399/functions/fwide.html
I don't think that's not a low-level OS function. But it is true that I may have overstated my case. Still, the underlying file system and the underlying hardware deal in bytes, not chars, on all platforms that matter. Encoded text /is/ bytes. > So the $1M question is, do we support text transports or not? All text is encoded, and encoded text is logically bytes, not chars. This is distinction is somewhat confused in D because the native string types in D do specify an encoding. However, it would be a mistake to conflate the internal encoding with the external encoding used by text transports. It's also worth noting that some of these text transports are not 8-bit clean. This means that they cannot transport UTF-8 (without transcoding), which means that they cannot transport D strings. > - email protocol and probably other Internet protocols All internet protocols ultimately work over IP, and IP is a binary protocol. > If we don't support text at the transport level, things can still made > to work but in a more fragile manner: upper-level protocols will need to > _know_ that although the API accepts any ubyte[], in fact the results > would be weird and malfunctioning if the wrong things are being passed. The situation for text would be no different from the situation for any other structured binary format. > A text-based transport would clarify at the type level that a text > stream accepts only UTF-encoded characters. You can still have that, as a wrapper around the byte stream. -- Rainer Deyke - rain...@eldwood.com