On 06/03/2016 07:51 PM, Patrick Schluter wrote:
You mean that '¶' is represented internally as 1 byte 0xB6 and that it
can be handled as such without error? This would mean that char literals
are broken. The only valid way to represent '¶' in memory is 0xC3 0x86.
Sorry if I misunderstood, I'm only starting to learn D.

There is no single char for '¶', that's right, and D gets that right. That's not what happens.

But there is a single wchar for it. wchar is a UTF-16 code unit, 2 bytes. UTF-16 encodes '¶' as a single code unit, so that's correct.

The problem is that you can accidentally search for a wchar in a range of chars. Every char is compared to the wchar by numeric value. But the numeric values of a char don't mean the same as those of a wchar, so you get nonsensical results.

A similar implicit conversion lets you search for a large number in a byte[]:

----
byte[] arr = [1, 2, 3];
foreach(x; arr) if (x == 1000) writeln("found it!");
----

You won't ever find 1000 in a byte[], of course. The byte type simply can't store the value. But you can compare a byte with an int. And that comparison is meaningful, unlike the comparison of a char with a wchar.

You can also produce false positives with numeric types, by mixing signed and unsigned types:

----
int[] arr = [1, -1, 3];
foreach(x; arr) if (x == uint.max) writeln("found it!");
----

uint.max is a large number, -1 is a small number. They're considered equal here because of an implicit conversion that messes with the meaning of the bits.

False negatives are not possible with numeric types. At least not in the same way as with differently sized Unicode code units.

Reply via email to