Re: Making all strings UTF ranges has some risk of WTF

Ali Çehreli Wed, 03 Feb 2010 23:00:15 -0800

Andrei Alexandrescu wrote:
> It's no secret that string et al. are not a magic recipe for writing
> correct Unicode code. However, things are pretty good and could be
> further improved by operating the following changes in std.array and
> std.range:
>
> - make front() and back() for UTF-8 and UTF-16 automatically decode the
> first and last Unicode character

They would yield dchar, right? Wouldn't that cause trouble in templatedcode?


> - make popFront() and popBack() skip one entire Unicode character
> (instead of just one code unit)

That's perfectly fine, because the opposite operations do "encode":

    string s = "ağ";
    assert(s.length == 3);
    s ~= 'ş';
    assert(s.length == 5);

> - alter isRandomAccessRange to return false for UTF-8 and UTF-16 strings

Ok.

> - change hasLength to return false for UTF-8 and UTF-16 strings

I don't understand that one. strings have lengths. Adding and removingdoes not alter length by 1 for those types. I don't think it's a bigdeal. It is already so in the language for those types. dstring does nothave that problem and could be used when by-1 change is desired.


> (b) Operate the change and mention that in range algorithms you should
> check hasLength and only then use "length" under the assumption that it
> really means "elements count".

The change sounds ok and hasLength should yield true. Or... can itreturn an enum { no, kind_of, yes } ;)

Current utf.decode takes the index by reference and modifies it by theamount. Could popFront() do something similar?

I think that's it: front() and popFront() are separated for cohesion.What is causing trouble here is the separation of "by-N" from popFront().

You are concerned that the user makes the assumption and popFront() willreduce by 1. I think that is the problem here.


How about something like:

  // returns the amount that the next popFront() will reduce length
  int nextStep();

Ali

Re: Making all strings UTF ranges has some risk of WTF

Reply via email to