On 10/16/2014 12:43 PM, spir via Digitalmars-d-learn wrote:

denis

spir is back! :)

On 10/16/2014 11:46 AM, Uranuz wrote:

> I have some string *str* of unicode characters. The question is how to
> check if I have valid unicode code point starting at code unit *index*?

It is easy if I understand the question as skipping over invalid UTF-8 sequences:

import std.stdio;

ubyte upperTwoBits(ubyte b)
{
    return b & 0b1100_0000;
}

bool isUtf8ContinuationByte(char c)
{
    enum utf8ContinuationPrefix = 0b1000_0000;
    return upperTwoBits(c) == utf8ContinuationPrefix;
}

void moveToValid(ref inout(char)[] s)
{
    /* Skip over UTF-8 continuation bytes. */
    while (s.length && isUtf8ContinuationByte(s[0])) {
        s = s[1..$];
    }

    /*
     * The wchar[] overload is too complicated for Ali at this time. :)
     *
     * Please see the following function template in phobos/std/utf.d:
     *
     * private dchar decodeImpl(bool canIndex, S)(...)
     *     if (is(S : const wchar[]) ...
     */
}

unittest
{
    auto s = "çde";
    moveToValid(s);
    assert(s == "çde");

    s = s[1 .. $];
    moveToValid(s);
    assert(s == "de", s);
}

void moveToValid(ref const(dchar)[] s)
{
    /* Every code unit is valid; nothing to do. */
}

void main()
{}

Ali

Reply via email to