On Monday, 23 April 2012 at 23:52:41 UTC, bearophile wrote:
James Miller:
I realised that when you want the number of characters, you
normally actually want to use walkLength, not length.
As with strlen() in C, unfortunately the result of
walkLength(somestring) is computed every time you call it...
because it's doesn't get cached.
A partial improvement for this situation is to assure
walkLength(somestring) to be strongly pure, and to assure the D
compiler is able to move this invariant pure computation out of
loops.
Is is reasonable for the compiler to pick this up during
semantic analysis and point out this situation?
This is not easy to do, because sometimes you want to know the
number of code points, and sometimes of code units.
I remember even a proposal to rename the "length" field to
another name for narrow strings, to avoid such bugs.
I was thinking about that. This is quite a vague suggestion, more
just throwing the idea out there and seeing what people think. I
am aware of the issue of walkLength being computed every time,
rather than being a constant lookup. One option would be to make
it only a warning in @safe code, so worst case scenario is that
you mark the function as @trusted. I feel this fits in with the
idea of @safe quite well, since you have to explicitly tell the
compiler that you know what you're doing.
Another option would be to have some sort of general lint tool
that picks up on these kinds of potential errors, though that is
a lot bigger scope...
--
James Miller