On 1/13/11 11:35 AM, Steven Schveighoffer wrote:
On Thu, 13 Jan 2011 14:08:36 -0500, Andrei Alexandrescu
<seewebsiteforem...@erdani.org> wrote:
Let's take a look:

// Incorrect string code
void fun(string s) {
foreach (i; 0 .. s.length) {
writeln("The character in position ", i, " is ", s[i]);
}
}

// Incorrect string_t code
void fun(string_t!char s) {
foreach (i; 0 .. s.codeUnits) {
writeln("The character in position ", i, " is ", s[i]);
}
}

Both functions are incorrect, albeit in different ways. The only
improvement I'm seeing is that the user needs to write codeUnits
instead of length, which may make her think twice. Clearly, however,
copiously incorrect code can be written with the proposed interface
because it tries to hide the reality that underneath a variable-length
encoding is being used, but doesn't hide it completely (albeit for
good efficiency-related reasons).

You might be looking at my previous version. The new version (recently
posted) will throw an exception for that code if a multi-code-unit
code-point is found.

I was looking at your latest. It's code that compiles and runs, but dynamically fails on some inputs. I agree that it's often better to fail noisily instead of silently, but in a manner of speaking the string-based code doesn't fail at all - it correctly iterates the code units of a string. This may sometimes not be what the user expected; most of the time they'd care about the code points.

It also supports this:

foreach(i, d; s)
{
writeln("The character in position ", i, " is ", d);
}

where i is the index (might not be sequential)

Well string supports that too, albeit with the nit that you need to specify dchar.

But wait, there's less. Functions for random-access range throughout
Phobos routinely assume fixed-length encoding, i.e. s[i + 1] lies next
to s[i]. From a cursory look at string_t, std.range will qualify it as
a RandomAccessRange without length. That's an odd beast but does not
change the fixed-length encoding assumption. So you'd need to
special-case algorithms for string_t, just like right now certain
algorithms are specialized for string.

isRandomAccessRange requires hasLength (see here:
http://www.dsource.org/projects/phobos/browser/trunk/phobos/std/range.d#L532).
This is not a random access range per that definition.

That's an interesting twist. By the way I specified length is required then because I couldn't imagine having random access into something that I can't tell the length of. Apparently I was wrong :o).

But a string
isn't a random access range anyways (it's specifically disallowed by
std.range per that same reference).

It isn't and it isn't supposed to be.

The plan is you would *not* have to special case algorithms for string_t
as you do currently for char[]. If that's not the case, then we haven't
achieved much. Simply put, we are separating out the strange nature of
strings from arrays, so the exceptional treatment of them is handled by
the type itself, not the functions using it.

That sounds reasonable.


Andrei

Reply via email to