Ali Çehreli wrote:
Andrei Alexandrescu wrote:
 > It's no secret that string et al. are not a magic recipe for writing
 > correct Unicode code. However, things are pretty good and could be
 > further improved by operating the following changes in std.array and
 > std.range:
 >
 > - make front() and back() for UTF-8 and UTF-16 automatically decode the
 > first and last Unicode character

They would yield dchar, right? Wouldn't that cause trouble in templated code?

Yes, dchar. There was some figuring out in parts of Phobos, but the gains are well worth it.

The simplifications are enormous. Until now, Phobos didn't hit the nail on the head with simple encoding/decoding/transcoding primitives. There were many attempts in std.utf, std.encoding, and std.string - all very clunky to use. Now I can just write s.front to get the first dchar of any string, and s.popFront to drop it. Very simple!

 > - make popFront() and popBack() skip one entire Unicode character
 > (instead of just one code unit)

That's perfectly fine, because the opposite operations do "encode":

    string s = "ağ";
    assert(s.length == 3);
    s ~= 'ş';
    assert(s.length == 5);

 > - alter isRandomAccessRange to return false for UTF-8 and UTF-16 strings

Ok.

 > - change hasLength to return false for UTF-8 and UTF-16 strings

I don't understand that one. strings have lengths. Adding and removing does not alter length by 1 for those types. I don't think it's a big deal. It is already so in the language for those types. dstring does not have that problem and could be used when by-1 change is desired.

hasLength is a property used by range algorithms to tell them that a range stores the length with a particular meaning (the number of elements). It is perfectly fine that strings don't obey hasLength but do expose .length - it's just that it has different semantics.

 > (b) Operate the change and mention that in range algorithms you should
 > check hasLength and only then use "length" under the assumption that it
 > really means "elements count".

The change sounds ok and hasLength should yield true. Or... can it return an enum { no, kind_of, yes } ;)

Current utf.decode takes the index by reference and modifies it by the amount. Could popFront() do something similar?

I think we could dedicate a special function for that. In fact it does exist I think - it's called stride().


Andrei

Reply via email to