Many functions in std.utf throw UTFException when we pass them malformed UTF, and many functions in std.string throw StringException. From this, I developed a habit of reading user files like so, hoping that it traps all malformed UTF:

    try {
        // call D standard lib on string from file
    }
    catch (Exception e) {
        // treat file as bogus
        // log e.msg
    }

But std.string.stripRight!string calls std.utf.codeLength, which doesn't ever throw on malformed UTF, but asserts false on errors:

    ubyte codeLength(C)(dchar c) @safe pure nothrow @nogc
        if (isSomeChar!C)
    {
        static if (C.sizeof == 1)
        {
            if (c <= 0x7F) return 1;
            if (c <= 0x7FF) return 2;
            if (c <= 0xFFFF) return 3;
            if (c <= 0x10FFFF) return 4;
            assert(false);
        }
        // ...
    }

Apparently, once my code calls stripRight, I should be sure that this string contains only well-formed UTF. Right now, my code doesn't guarantee that.

Should I always validate text from files manually with std.utf.validate?

Or should I memorize which functions throw, then validate manually whenever I call the non-throwing UTF functions? What is the pattern behind what throws and what asserts false?

-- Simon

Reply via email to