On 16-Aug-2015 03:50, Walter Bright wrote:
On 8/15/2015 3:18 AM, Sönke Ludwig wrote:
There is no reason to validate UTF-8 input. The only place where
non-ASCII code units can even legally appear is inside strings, and
there they can just be copied verbatim while looking for the end of the
string.
The idea is to assume that any char based input is already valid UTF
(as D
defines it), while integer based input comes from an unverified
source, so that
it still has to be validated before being cast/copied into a 'string'.
I think
this is a sensible approach, both semantically and performance-wise.

The json parser will work fine without doing any validation at all. I've
been implementing string handling code in Phobos with the idea of doing
validation only if the algorithm requires it, and only for those parts
that require it.


Aye.

There are many validation algorithms in Phobos one can tack on - having
two implementations of every algorithm, one with an embedded reinvented
validation and one without - is too much.

Actually there are next to none. `validate` that throws on failed validation is a misnomer.

The general idea with algorithms is that they do not combine things, but
they enable composition.


At the lower level such as tokenizers combining a couple of simple steps together makes sense because it makes things run faster. It usually eliminates the need for temporary result that must be digestible by the next range.

For instance "combining" decoding and character classification one may side-step generating the codepoint value itself (because now it doesn't have to produce it for the top-level algorithm).


--
Dmitry Olshansky

Reply via email to