Andrei Alexandrescu wrote:
> It's no secret that string et al. are not a magic recipe for writing
> correct Unicode code. However, things are pretty good and could be
> further improved by operating the following changes in std.array and
> std.range:
>
> - make front() and back() for UTF-8 and UTF-16 automatically decode the
> first and last Unicode character

They would yield dchar, right? Wouldn't that cause trouble in templated code?

> - make popFront() and popBack() skip one entire Unicode character
> (instead of just one code unit)

That's perfectly fine, because the opposite operations do "encode":

    string s = "ağ";
    assert(s.length == 3);
    s ~= 'ş';
    assert(s.length == 5);

> - alter isRandomAccessRange to return false for UTF-8 and UTF-16 strings

Ok.

> - change hasLength to return false for UTF-8 and UTF-16 strings

I don't understand that one. strings have lengths. Adding and removing does not alter length by 1 for those types. I don't think it's a big deal. It is already so in the language for those types. dstring does not have that problem and could be used when by-1 change is desired.

> (b) Operate the change and mention that in range algorithms you should
> check hasLength and only then use "length" under the assumption that it
> really means "elements count".

The change sounds ok and hasLength should yield true. Or... can it return an enum { no, kind_of, yes } ;)

Current utf.decode takes the index by reference and modifies it by the amount. Could popFront() do something similar?

I think that's it: front() and popFront() are separated for cohesion. What is causing trouble here is the separation of "by-N" from popFront().

You are concerned that the user makes the assumption and popFront() will reduce by 1. I think that is the problem here.

How about something like:

  // returns the amount that the next popFront() will reduce length
  int nextStep();

Ali

Reply via email to