On 09/02/2017 11:02 AM, lithium iodate wrote: > On Saturday, 2 September 2017 at 17:41:34 UTC, Ali Çehreli wrote: >> You're right but I think there is no intention of interpreting the >> result as UTF-8. "f62026" is just to be used as "f62026", which can be >> converted byte-by-byte back to "ö…". That's how understand the >> requirement anyway. >> >> Ali > > That is not possible, because you cannot know whether "f620" and "26" or > "f6" and "2026" (or any other combination) should form a code point > each. Additional padding to constant width (8 hex chars) is needed.
Ok, I see that I made a mistake but I still don't think the conversion is one way. If we can convert byte-by-byte, we should be able to convert back byte-by-byte, right? What I failed to ensure was to iterate by code units. The following is able to get the same string back:
import std.stdio; import std.string; import std.algorithm; import std.range; import std.utf; import std.conv; auto toHex(R)(R input) { // As Moritz Maxeiner says, this format is expensive return input.byCodeUnit.map!(c => format!"%02x"(c)).joiner; } int hexValue(C)(C c) { switch (c) { case '0': .. case '9': return c - '0'; case 'a': .. case 'f': return c - 'a' + 10; default: assert(false); } } auto fromHex(R, Dst = char)(R input) { return input.chunks(2).map!((ch) { auto high = ch.front.hexValue * 16; ch.popFront(); return high + ch.front.hexValue; }).map!(value => cast(Dst)value); } void main() { assert("AAA".toHex.fromHex.equal("AAA")); assert("ö…".toHex.fromHex.equal("ö…".byCodeUnit)); // Alternative check: assert("ö…".toHex.fromHex.text.equal("ö…")); } Ali