Ali Çehreli wrote:
Andrei Alexandrescu wrote:
> It's no secret that string et al. are not a magic recipe for writing
> correct Unicode code. However, things are pretty good and could be
> further improved by operating the following changes in std.array and
> std.range:
>
> - make front() and back() for UTF-8 and UTF-16 automatically decode the
> first and last Unicode character
They would yield dchar, right? Wouldn't that cause trouble in templated
code?
Yes, dchar. There was some figuring out in parts of Phobos, but the
gains are well worth it.
The simplifications are enormous. Until now, Phobos didn't hit the nail
on the head with simple encoding/decoding/transcoding primitives. There
were many attempts in std.utf, std.encoding, and std.string - all very
clunky to use. Now I can just write s.front to get the first dchar of
any string, and s.popFront to drop it. Very simple!
> - make popFront() and popBack() skip one entire Unicode character
> (instead of just one code unit)
That's perfectly fine, because the opposite operations do "encode":
string s = "ağ";
assert(s.length == 3);
s ~= 'ş';
assert(s.length == 5);
> - alter isRandomAccessRange to return false for UTF-8 and UTF-16 strings
Ok.
> - change hasLength to return false for UTF-8 and UTF-16 strings
I don't understand that one. strings have lengths. Adding and removing
does not alter length by 1 for those types. I don't think it's a big
deal. It is already so in the language for those types. dstring does not
have that problem and could be used when by-1 change is desired.
hasLength is a property used by range algorithms to tell them that a
range stores the length with a particular meaning (the number of
elements). It is perfectly fine that strings don't obey hasLength but do
expose .length - it's just that it has different semantics.
> (b) Operate the change and mention that in range algorithms you should
> check hasLength and only then use "length" under the assumption that it
> really means "elements count".
The change sounds ok and hasLength should yield true. Or... can it
return an enum { no, kind_of, yes } ;)
Current utf.decode takes the index by reference and modifies it by the
amount. Could popFront() do something similar?
I think we could dedicate a special function for that. In fact it does
exist I think - it's called stride().
Andrei