Re: Making all strings UTF ranges has some risk of WTF

Andrei Alexandrescu Wed, 03 Feb 2010 18:55:24 -0800

Chad J wrote:

Andrei Alexandrescu wrote:

...


What can be done about that? I see a number of solutions:

(a) Do not operate the change at all.

(b) Operate the change and mention that in range algorithms you should
check hasLength and only then use "length" under the assumption that it
really means "elements count".

(c) Deprecate the name .length for UTF-8 and UTF-16 strings, and define
a different name for that. Any other name (codeUnits, codes etc.) would
do. The entire point is to not make algorithms believe strings have a
.length property.

(d) Have std.range define a distinct property called e.g. "count" and
then specialize it appropriately. Then change all references to .length
in std.algorithm and elsewhere to .count.

What would you do? Any ideas are welcome.


Andrei


I'm leaning towards (c) here.

To me the .length on char[] and wchar[] are kinda like doing this:

struct SomePOD
{
    int a, b;
    double y;
}


SomePOD pod;
auto len = pod.length;
assert(len == 16); // true.


I'll admit it's not a perfect analogy.  What I'm playing on here is that
the .length on char[] and wchar[] returns the /size of/ the string in
bytes rather than the /length/ of the string in number of (well-formed)
characters.

Unfortunately .sizeof is supposed to return the size of the string's
reference (8 bytes on x86 systems) and not the size of the string, IIRC.
 So that's taken.

So perhaps a .bytes or .nbytes property.  Maybe make it work for arrays
of structs and things like that too.  A tuple (or any container) of
non-homogeneous elements could probably benefit from this property as well.

Given such a property being available, I wouldn't miss .length at all.
It's quite misleading.


I hear you. Actually, to either quench or add to the confusion, .length
for wstring returns the length in 16-bit units, not bytes.

Andrei

Re: Making all strings UTF ranges has some risk of WTF

Reply via email to