On 10/14/10 21:22 CDT, Rainer Deyke wrote:
On 10/14/2010 15:49, Andrei Alexandrescu wrote:
Good point. Perhaps indeed it's best to only deal with bytes and
characters at transport level.

Make that just bytes.

Characters data must be encoded into bytes before it is written and
decoded before it is read.  The low-level OS functions only deal with
bytes, not characters.

I'm not so sure about that. For example, some code in std.stdio is dedicated to supporting fwide():

http://www.opengroup.org/onlinepubs/000095399/functions/fwide.html

As far as I understand, a wide stream is essentially an UCS-2 (or UTF-16? Not sure) stream that is impossible to abstract away as a stream of bytes.

I see Windows' commitment to fwide is... odd:

http://msdn.microsoft.com/en-us/library/aa985619%28VS.80%29.aspx

The ultimate question is whether we want to support that (as well as other dedicated text streams) or not.

Text encoding is a complicated process - consider different unicode
encodings, different non-unicode encodings, byte order markers, and
Windows versus Unix line endings.  Furthermore, it is often useful to
wedge an additional translation layer between the low-level (binary)
stream and the high-level text encoding layer, such as an encryption or
compression layer.

Writing characters directly to streams made sense in the pre-Unicode
world where there was a one-to-one correspondence between characters and
bytes.  In a modern world, text encoding is an important service that
deserves its own standalone module.

I'd say quite the opposite. Since now encodings are embedded all the way down at the low level (per fwide above), we can't pretend it's all bytes down there and leave characters to upper layers. There _are_ transports that deal with characters directly.

So the $1M question is, do we support text transports or not?

- fwide streams

- files for which isatty() returns true (http://www.opengroup.org/onlinepubs/009695399/functions/isatty.html)

- email protocol and probably other Internet protocols

- others?

If we don't support text at the transport level, things can still made to work but in a more fragile manner: upper-level protocols will need to _know_ that although the API accepts any ubyte[], in fact the results would be weird and malfunctioning if the wrong things are being passed. A text-based transport would clarify at the type level that a text stream accepts only UTF-encoded characters.

I think either way is not a catastrophe. We can make it work.


Andrei

Reply via email to