On Thu, 13 Jan 2011 15:51:00 -0500, Andrei Alexandrescu
<seewebsiteforem...@erdani.org> wrote:
On 1/13/11 11:35 AM, Steven Schveighoffer wrote:
On Thu, 13 Jan 2011 14:08:36 -0500, Andrei Alexandrescu
<seewebsiteforem...@erdani.org> wrote:
Let's take a look:
// Incorrect string code
void fun(string s) {
foreach (i; 0 .. s.length) {
writeln("The character in position ", i, " is ", s[i]);
}
}
// Incorrect string_t code
void fun(string_t!char s) {
foreach (i; 0 .. s.codeUnits) {
writeln("The character in position ", i, " is ", s[i]);
}
}
Both functions are incorrect, albeit in different ways. The only
improvement I'm seeing is that the user needs to write codeUnits
instead of length, which may make her think twice. Clearly, however,
copiously incorrect code can be written with the proposed interface
because it tries to hide the reality that underneath a variable-length
encoding is being used, but doesn't hide it completely (albeit for
good efficiency-related reasons).
You might be looking at my previous version. The new version (recently
posted) will throw an exception for that code if a multi-code-unit
code-point is found.
I was looking at your latest. It's code that compiles and runs, but
dynamically fails on some inputs. I agree that it's often better to fail
noisily instead of silently, but in a manner of speaking the
string-based code doesn't fail at all - it correctly iterates the code
units of a string. This may sometimes not be what the user expected;
most of the time they'd care about the code points.
iterating the code units is possible by accessing the array data. i.e.
you could do:
foreach(i, c; s.data)
if you want the code-units.
That is the point of having a separate type. Using string_t tells the
library "I'm using this data as a string". Using char[] tells the library
"I'm using this data as an array."
The difference here is, you have to *specifically* try to access the code
units, the default is code-points. All it does really is switch the
default.
It also supports this:
foreach(i, d; s)
{
writeln("The character in position ", i, " is ", d);
}
where i is the index (might not be sequential)
Well string supports that too, albeit with the nit that you need to
specify dchar.
This is not a small problem.
isRandomAccessRange requires hasLength (see here:
http://www.dsource.org/projects/phobos/browser/trunk/phobos/std/range.d#L532).
This is not a random access range per that definition.
That's an interesting twist. By the way I specified length is required
then because I couldn't imagine having random access into something that
I can't tell the length of. Apparently I was wrong :o).
Yes, in fact, you could say that specifically defines VLERange ;) But
actually, there are two types of VLE ranges, those which can be randomly
accessed (where determining the beginning of a code point, given a random
index is possible) and those that cannot (where decoding depends on the
exact order of the data). Actually, those would not be bi-directional
ranges anyways.
But a string
isn't a random access range anyways (it's specifically disallowed by
std.range per that same reference).
It isn't and it isn't supposed to be.
I agree with that assessment, which is why I omitted length.
-Steve