Re: The Case For Autodecode

ag0aep6g via Digitalmars-d Fri, 03 Jun 2016 11:48:06 -0700

On 06/03/2016 07:51 PM, Patrick Schluter wrote:

You mean that '¶' is represented internally as 1 byte 0xB6 and that it
can be handled as such without error? This would mean that char literals
are broken. The only valid way to represent '¶' in memory is 0xC3 0x86.
Sorry if I misunderstood, I'm only starting to learn D.

There is no single char for '¶', that's right, and D gets that right.That's not what happens.

But there is a single wchar for it. wchar is a UTF-16 code unit, 2bytes. UTF-16 encodes '¶' as a single code unit, so that's correct.

The problem is that you can accidentally search for a wchar in a rangeof chars. Every char is compared to the wchar by numeric value. But thenumeric values of a char don't mean the same as those of a wchar, so youget nonsensical results.

A similar implicit conversion lets you search for a large number in abyte[]:


----
byte[] arr = [1, 2, 3];
foreach(x; arr) if (x == 1000) writeln("found it!");
----

You won't ever find 1000 in a byte[], of course. The byte type simplycan't store the value. But you can compare a byte with an int. And thatcomparison is meaningful, unlike the comparison of a char with a wchar.

You can also produce false positives with numeric types, by mixingsigned and unsigned types:


----
int[] arr = [1, -1, 3];
foreach(x; arr) if (x == uint.max) writeln("found it!");
----

uint.max is a large number, -1 is a small number. They're consideredequal here because of an implicit conversion that messes with themeaning of the bits.

False negatives are not possible with numeric types. At least not in thesame way as with differently sized Unicode code units.

Re: The Case For Autodecode

Reply via email to