Re: Go and generic programming on reddit, also touches on D

Timon Gehr Mon, 19 Sep 2011 17:10:46 -0700

On 09/19/2011 04:43 PM, Steven Schveighoffer wrote:

On Mon, 19 Sep 2011 10:24:33 -0400, Timon Gehr <timon.g...@gmx.ch> wrote:

On 09/19/2011 04:02 PM, Steven Schveighoffer wrote:


So I think it's not only limiting to require x.length to be $, it's very
wrong in some cases.

Also, think of a string. It has no length (well technically, it does,
but it's not the number of elements), but it has a distinct end point. A
properly written string type would fail to compile if $ was s.length.


But you'd have to compute the length anyways in the general case:

str[0..$/2];

Or am I misunderstanding something?


That's half the string in code units, not code points.

If string was properly implemented, this would fail to compile. $ is not
the length of the string range (meaning the number of code points). The
given slice operation might actually create an invalid string.

Programmers have to be aware of that if they want efficient code thatdeals with unicode. I think having random access to the code units andbeing able to iterate per code point is fine, because it gives you thebest of both worlds. Manually decoding a string and slicing it atpositions that were remembered to be safe has been good enough for me,at least it is efficient.


It's tricky, because you want fast slicing, but only certain slices are
valid. I once created a string type that used a char[] as its backing,
but actually implemented the limitations that std.range tries to enforce
(but cannot). It's somewhat of a compromise. If $ was mapped to
s.length, it would fail to compile, but I'm not sure what I *would* use
for $. It actually might be the code units, which would not make the
above line invalid.

-Steve

Well it would have to be consistent for a string type that "does itright" . Either the string is indexed with units or it is indexed withcode points, and the other option should be provided. Dollar should justbe the length of what is used for indexing/slicing here, and having thatbe different from length makes for a somewhat awkward interface imho.

Btw, D double-quoted string literals let you define invalid bytesequences with eg. octal literals:

string s="\377";

What would be use cases for that? Shouldn't \377 map to the extendedascii charset instead and yield the same code point that would be givenin C dq strings?

Re: Go and generic programming on reddit, also touches on D

Reply via email to