Re: VLERange: a range in between BidirectionalRange and RandomAccessRange

Michel Fortin Thu, 13 Jan 2011 19:10:42 -0800

On 2011-01-13 15:51:00 -0500, Andrei Alexandrescu<seewebsiteforem...@erdani.org> said:

On 1/13/11 11:35 AM, Steven Schveighoffer wrote:

On Thu, 13 Jan 2011 14:08:36 -0500, Andrei Alexandrescu
<seewebsiteforem...@erdani.org> wrote:

Let's take a look:


// Incorrect string code
void fun(string s) {
foreach (i; 0 .. s.length) {
writeln("The character in position ", i, " is ", s[i]);
}
}

// Incorrect string_t code
void fun(string_t!char s) {
foreach (i; 0 .. s.codeUnits) {
writeln("The character in position ", i, " is ", s[i]);
}
}

Both functions are incorrect, albeit in different ways. The only
improvement I'm seeing is that the user needs to write codeUnits
instead of length, which may make her think twice. Clearly, however,
copiously incorrect code can be written with the proposed interface
because it tries to hide the reality that underneath a variable-length
encoding is being used, but doesn't hide it completely (albeit for
good efficiency-related reasons).


You might be looking at my previous version. The new version (recently
posted) will throw an exception for that code if a multi-code-unit
code-point is found.

I was looking at your latest. It's code that compiles and runs, butdynamically fails on some inputs. I agree that it's often better tofail noisily instead of silently, but in a manner of speaking thestring-based code doesn't fail at all - it correctly iterates the codeunits of a string. This may sometimes not be what the user expected;most of the time they'd care about the code points.

That's forgetting that most of the time people care about graphemes(user-perceived characters), not code points.

It also supports this:

foreach(i, d; s)
{
writeln("The character in position ", i, " is ", d);
}

where i is the index (might not be sequential)
Well string supports that too, albeit with the nit that you need tospecify dchar.

Except it breaks with combining characters. For instance, take thestring "t̃", which is two code points -- 't' followed by combiningtilde (U+0303) -- and you'll get the following output:


        The character in position 0 is t
        The character in position 1 is ̃

(Note that the tilde becomes combined with the preceding space character.)

The conception of character that normal people have does not match thenotion of code points when combining characters enters the equation.



--
Michel Fortin
michel.for...@michelf.com
http://michelf.com/

Re: VLERange: a range in between BidirectionalRange and RandomAccessRange

Reply via email to