On 09/02/2017 11:02 AM, lithium iodate wrote:
> On Saturday, 2 September 2017 at 17:41:34 UTC, Ali Çehreli wrote:
>> You're right but I think there is no intention of interpreting the
>> result as UTF-8. "f62026" is just to be used as "f62026", which can be
>> converted byte-by-byte back to "ö…". That's how understand the
>> requirement anyway.
>>
>> Ali
>
> That is not possible, because you cannot know whether "f620" and "26" or
> "f6" and "2026" (or any other combination) should form a code point
> each. Additional padding to constant width (8 hex chars) is needed.

Ok, I see that I made a mistake but I still don't think the conversion is one way. If we can convert byte-by-byte, we should be able to convert back byte-by-byte, right? What I failed to ensure was to iterate by code units. The following is able to get the same string back:

import std.stdio;
import std.string;
import std.algorithm;
import std.range;
import std.utf;
import std.conv;

auto toHex(R)(R input) {
    // As Moritz Maxeiner says, this format is expensive
    return input.byCodeUnit.map!(c => format!"%02x"(c)).joiner;
}

int hexValue(C)(C c) {
    switch (c) {
    case '0': .. case '9':
        return c - '0';
    case 'a': .. case 'f':
        return c - 'a' + 10;
    default:
        assert(false);
    }
}

auto fromHex(R, Dst = char)(R input) {
    return input.chunks(2).map!((ch) {
            auto high = ch.front.hexValue * 16;
            ch.popFront();
            return high + ch.front.hexValue;
        }).map!(value => cast(Dst)value);
}

void main() {
    assert("AAA".toHex.fromHex.equal("AAA"));

    assert("ö…".toHex.fromHex.equal("ö…".byCodeUnit));
    // Alternative check:
    assert("ö…".toHex.fromHex.text.equal("ö…"));
}

Ali

Reply via email to