On Wednesday, November 20, 2013 16:26:59 Walter Bright wrote: > On 11/20/2013 3:16 AM, Jonathan M Davis wrote: > > ValidatedString would then avoid any extra validation when iterating over > > the characters, though I don't know how much of an efficiency gain that > > would actually be given that much of the validation occurs naturally when > > decoding or using stride. It would have the downside that any function > > which specializes on strings would likely have to then specialize on > > ValidatedString as well. So, while I agree with the idea in concept, I'd > > propose that we benchmark the difference in decoding and striding without > > the checks and see if there actually is much difference. Because if there > > isn't, then I don't think that it's worth going to the trouble of adding > > something like ValidatedString. > Utf validation isn't the only form of validation for strings. You could, for > example, validate that the string doesn't contain SQL injection code, or > contains a correctly formatted date, or has a name that is guaranteed to be > in your employee database, or is a valid phone number, or is a correct > email address, etc. > > Again, validation is not defined by D, it is defined by the constraints YOUR > PROGRAM puts on it.
Yes, but we seemed to be discussing the possibility of having some kind of type in Phobos which indicated that the string had been validated for UTF correctness. I wouldn't expect other types of string validation to end up in Phobos. And without the type for UTF validation being in Phobos and specialized on in Phobos functions, I don't think that I would ever want to use it, because in such a case, you lose out on all of the specialization that Phobos does for strings and are stuck with a range of dchar, which will force a lot of extra decoding even if some of the validation can be skipped, since it was already validated, whereas a number of Phobos functions are able to specialize on narrow strings and avoid decoding altogether. That performance boost would be lost if a string was wrapped in a UTFValidatedString without Phobos specializing on UTFValidatedString, and based on how decode and stride work, it looks to me like the decoding costs way more than the little bit of extra validation that is currently done as part of that such that avoiding the decoding is likely to be a much greater performance boost than avoiding those checks. And if that is indeed the case, I don't see much point to something like UTFValidatedString unless Phobos specializes for it like it specializes for narrow strings. Other types of string validation might very well be worth doing without Phobos knowing about them, but having the wrapper type which indicates that that validation has been done still needs to be worth more than the performance hit of not being able to use naked strings anymore and losing any performance gains that come from the functions which specialize for narrow strings. And that's probably true for strings that just get passed around but probably isn't true for strings that end up being processed by range-based functions a lot. - Jonathan M Davis