On 09/19/2011 05:52 PM, Steven Schveighoffer wrote:
On Mon, 19 Sep 2011 11:03:15 -0400, Timon Gehr <timon.g...@gmx.ch> wrote:

On 09/19/2011 04:43 PM, Steven Schveighoffer wrote:
On Mon, 19 Sep 2011 10:24:33 -0400, Timon Gehr <timon.g...@gmx.ch>
wrote:

On 09/19/2011 04:02 PM, Steven Schveighoffer wrote:

So I think it's not only limiting to require x.length to be $, it's
very
wrong in some cases.

Also, think of a string. It has no length (well technically, it does,
but it's not the number of elements), but it has a distinct end
point. A
properly written string type would fail to compile if $ was s.length.


But you'd have to compute the length anyways in the general case:

str[0..$/2];

Or am I misunderstanding something?


That's half the string in code units, not code points.

If string was properly implemented, this would fail to compile. $ is not
the length of the string range (meaning the number of code points). The
given slice operation might actually create an invalid string.

Programmers have to be aware of that if they want efficient code that
deals with unicode. I think having random access to the code units and
being able to iterate per code point is fine, because it gives you the
best of both worlds. Manually decoding a string and slicing it at
positions that were remembered to be safe has been good enough for me,
at least it is efficient.

I find the same. I don't think I've ever dealt with arbitrary math
operations to do slices of strings like the above. I only slice a string
when I know the bounds are sane.

Like I said, it's a compromise. The "right" thing to do is probably not
even allow code-unit access via index (some have even argued that
code-point slicing is too dangerous, because you can split a grapheme,
leaving a valid, but incorrect slice of the original).

It's tricky, because you want fast slicing, but only certain slices are
valid. I once created a string type that used a char[] as its backing,
but actually implemented the limitations that std.range tries to enforce
(but cannot). It's somewhat of a compromise. If $ was mapped to
s.length, it would fail to compile, but I'm not sure what I *would* use
for $. It actually might be the code units, which would not make the
above line invalid.

-Steve

Well it would have to be consistent for a string type that "does it
right" . Either the string is indexed with units or it is indexed with
code points, and the other option should be provided. Dollar should
just be the length of what is used for indexing/slicing here, and
having that be different from length makes for a somewhat awkward
interface imho.

Except we are defining a string as a *range* and a range's length is
defined as the number of elements.

Note that hasLength!string evaluates to false in std.range.'

Ok. I feel the way narrow strings are handled in Phobos are a reasonable trade-off.


$ should denote the end point of the aggregate, but it does not have to
be equivalent to length, or even an integer/uint. It should just mean
"end".

Point taken. What is the solution for infinite ranges? Should any arithmetics on $ just be disallowed?


I also proposed a while back to have ^ denote the beginning (similar to
regex) of an aggregate for aggregates that don't use 0 as the beginning,
but people didn't like it :)

-Steve

=D, well, it is grammatically unambiguous!

Reply via email to