20-Nov-2013 14:45, Lars T. Kyllingstad пишет:
On Wednesday, 20 November 2013 at 00:01:00 UTC, Andrei Alexandrescu wrote:
(c) A variety of text functions currently suffer because we don't make
the difference between validated UTF strings and potentially invalid
ones.

I think it is fair to always assume that a char[] is a valid UTF-8
string, and instead perform the validation when creating/filling the
string from a non-validated source.

Take std.file.read() as an example; it returns void[], but has a
validating counterpart in std.file.readText().

Sadly it's horrifically slow to do so. Above all practicality must take precedence. Would you like to validate the whole file just to later re-scan it anew to say tokenize source file?


I think we should use ubyte[] to a greater extent for data which is
potentially *not* valid UTF.  Examples include interfacing with C
functions, where I think there is a tendency towards always translating
C char to D char, when they are in fact not equivalent.  Another example
is, again, std.file.read(), which currently returns void[].  I guess it
is a matter of taste, but I think ubyte[] would be more appropriate
here, since you can actually use it for something without casting it first.

Otherwise I think it's a good idea to encode high-level invariants in types. The only problem is inadvertent template bloat then.

[snip]

--
Dmitry Olshansky

Reply via email to