On Friday, 9 November 2018 at 10:45:49 UTC, Vinay Sajip wrote:
As I see it, a ubyte 0x20 could be decoded to an ASCII char ' ', and likewise to wchar or dchar. It doesn't (to me) make sense to decode a char to a wchar or dchar. Anyway, you've shown me how decodeFront can be used, so great!

The character ' ' simply is the number 0x20 in char, wchar and dchar. The difficulty arises when you use non-ascii characters:

if ("€"[0] == '€')

The character code of € is U+20AC, but a char only goes to 0xFF. To work around that, UTF-8 gives higher code points multiple bytes (or code units). The € sign will be represented as [0xE2, 0x82, 0xAC]. So the code above actually checks 0xE2 == 0x20AC, which will return false. If you decodeFront on [0xE2, 0x82, 0xAC], it will actually output 0x20AC and modify the range to be [] since it consumed all three code units. That way you can handle code points properly.
See: https://en.wikipedia.org/wiki/UTF-8#Examples

On Friday, 9 November 2018 at 10:45:49 UTC, Vinay Sajip wrote:
Supplementary question: is an operation like r.map!(x => cast(char) x) effectively a run-time no-op and just to keep the compiler happy, or does it actually result in code being executed? I came across a similar issue with ranges recently where the answer was to map immutable(byte) to byte in the same way.

On dmd without optimization, the map function will compile to:
        push    RBP          //
        mov     RBP,RSP      //
        sub     RSP,010h     // build stack frame
        mov     -8[RBP],EDI  // put argument0 on the stack
mov AL,-8[RBP] // put the stack value in the lower 8 bits of the return register
        leave                // delete stack frame
        ret                  // return

So that will be essentially a run-time no-op. However, if you pass -O -inline to dmd I'm pretty sure it will optimize it away. GDC and LDC with -O1 or higher will certainly eliminate all run-time cost.

Reply via email to