On Sunday, 25 August 2013 at 19:25:08 UTC, qznc wrote:
Apparently, ElementType!string evaluates to dchar. I would have expected char. Why is that?

It is mentioned in the documentation of `ElementType`. Use `std.range.ElementEncodingType` or `std.traits.ForeachType` to get `char` and `wchar` when given arrays of those two types.

As for the rationale:

`string`, being an alias for `immutable(char)[]`, is an array of UTF-8 code units - an array of `char`s. However, it is indeed a forward range of code points (represented as a UTF-32 code unit - `dchar`). It's a (slightly controversial) choice that was made to make Unicode-correct code the easiest and most intuitive to write, as code points are much more useful than code units.

Note that it is not a random-access range. UTF-8 is a variable length encoding, so several code units can be required to encode a single code point. Hence, a non-trivial search is required to get the n'th code point in a UTF-8 or UTF-16 string.

Another name for a code point is "character" (technically, a character is what the code point translates to in the UCS). However, it can be a deceptive name - the units we see on screen when rendered are "graphemes", as Unicode characters can be combining, zero-width etc.

To get a range of UTF-8 or UTF-16 code units, the code units have to be represented as something other than `char` and `wchar`. For example, you can cast your string to immutable(ubyte)[] to operate on that, then cast it back at a later point.

Reply via email to