Hello,
I just noted noted that D's builtin *string types do not behave the same way in
front of invalid code unit sequences. For instance:
void main () {
assert("hæ?" == "\x68\xc3\xa6\x3f");
// Note: removing \xa6 thus makes invalid utf8.
string s1 = "\x68\xc3\x3f";
// ==> OK, accepted -- but write-ing indeed produces "h�?".
dstring s4 = "\x68\xc3\x3f";
// ==> compile-time Error: invalid UTF-8 sequence
}
I guess this is because, while converting from string to dstring, meaning while
decoding code units to code points, D is forced to check sequence validity. But
this is not needed, and not done, for utf8 string. Am I right on this?
If yes, isn't it risky to let utf8 (and wstrings?) unchecked? I mean, to have a
concrete safety difference with dstrings? I know there are utf checking
routines in the std lib, but for dstrings one does not need no call them
explicitely.
(Note that this checking is done at compile-time for source code literals.)
denis
-- -- -- -- -- -- --
vit esse estrany ☣
spir.wikidot.com