Re: Why is string.front dchar?

Jakob Ovrum Mon, 20 Jan 2014 02:02:36 -0800

On Thursday, 16 January 2014 at 06:59:43 UTC, Maxim Fomin wrote:

This is wrong. String in D is de facto (by implementation, specmay say whatever is convenient for advertising D) array ofsingle bytes which can keep UTF-8 code units. No way stringtype in D is always a string in a sense of codepoints/characters. Sometimes it happens that string typebehaves like 'string', but if you put UTF-16 or UTF-32 text itwould remind you what string type really is.

By implementation they are also UTF strings. String literals useUTF, `char.init` is 0xFF and `wchar.init` is 0xFFFF, foreach overnarrow strings with `dchar` iterator variable type does UTFdecoding etc.

I don't think you know what you're talking about; putting UTF-16or UTF-32 in `string` is utter madness and not triviallypossible. We have `wchar`/`wstring` and `dchar`/`dstring` forUTF-16 and UTF-32, respectively.

Operations on code units are rare, which is why the standardlibrary instead treats strings as ranges of code points, forcorrectness by default. However, we must not prevent the userfrom being able to work on arrays of code units, as manystring algorithms can be optimized by not doing full UTFdecoding. The standard library does this on many occasions,and there are more to come.
This is attempt to explain problematic design as a wise action.

No, it's not. Please leave crappy, unsubstantiated arguments likethis out of these forums.

[1] http://dlang.org/type
By the way, the link you provide says char is unsigned 8 bittype which can keep value of UTF-8 code unit.

Not *can*, but *does*. Otherwise it is an error in the program.The specification, compiler implementation (as shown above) andstandard library all treat `char` as a UTF-8 code unit. Treat itotherwise at your own peril.

UTF is irrelevant because the problem is in D implementation.Seehttp://forum.dlang.org/thread/hoopiiobddbapybbw...@forum.dlang.org(in particular 2nd page).
The root of the issue is that D does not provide 'utf' typewhich would handle correctly strings and charactersirrespective of the format. But instead the language pretendsthat it supports such type by allowing to convert to singlebyte char array both literals "sad" and "säд". And ['s', 'ä','д'] is by the way neither char[], no wchar[], even not dchar[]but sequence of integers, which compounds oddities in charactertypes.

The only problem in the implementation here that you illustrateis that `['s', 'ä', 'д']` is of type `int[]`, which is a bug. Itshould be `dchar[]`. The length of `char[]` works as intended.

Problems with string type can be illustrated as possiblesituation in domain of integers type. Assume that user wants'number' type which accepts both integers, floats and doublesand treats them properly. This would require either librarysolution or a new special type in a language which is supportedby both compiler and runtime library, which performs operationat runtime on objects of number type according to theireffective type.
D designers want to support such feature (to make the languagebetter), but as it happens in other situations, the support isonly limited: compiler allows to do
alias immutable(int)[] number;
number my_number = [0, 3.14, 3.14l];

I don't understand this example. The compiler does *not* allowthat code; try it for yourself.

Re: Why is string.front dchar?

Reply via email to