On 7/4/12 10:11 PM, Jonathan M Davis wrote:
On Wednesday, July 04, 2012 21:33:31 Andrei Alexandrescu wrote:
Great. Could you please post some code so we play with it? Thanks.

Okay. You can use this:
[snip]

Thanks. I made the following change to popFront:

@trusted void popFront(A)(ref A a)
if (isNarrowString!A && isMutable!A && !isStaticArray!A)
{
assert(a.length, "Attempting to popFront() past the end of an array of "
            ~ typeof(a[0]).stringof);
    immutable c = a[0];
    if (c < 0x80)
    {
        a = a.ptr[1 .. a.length];
    }
    else
    {
        import core.bitop;
        immutable msbs = 7 - bsr(~c);
        if ((msbs >= 2) & (msbs <= 6))
        {
            a = a[msbs .. $];
        }
        else
        {
            //throw new UTFException("Invalid UTF-8 sequence", 0);
        }
    }
}

For some reason, uncommenting the throwing code makes the function significantly slower. That seems to be an issue with the compiler because putting the throw in a function seems to restore speed.

With the above, I get on a Mac:

ascii 126.61%: old [682 ms, 479 μs, and 3 hnsecs], new [864 ms, 102 μs, and 1 hnsec] uni 86.76%: old [1 sec, 888 ms, 17 μs, and 8 hnsecs], new [1 sec, 638 ms, 76 μs, and 3 hnsecs]

So the ascii string handling became actually 27% faster whereas the uni string handling is 13% slower.

It might be argued that checking for validity is not the metier of popFront; only if you do try to use stuff (e.g. by calling front) should one see exceptions. If popFront sees incorrect characters, it should just skip them one at a time. Following that argument, the implementation may be:

@trusted void popFront(A)(ref A a)
if (isNarrowString!A && isMutable!A && !isStaticArray!A)
{
assert(a.length, "Attempting to popFront() past the end of an array of "
            ~ typeof(a[0]).stringof);
    immutable c = a[0];
    if (c < 0x80)
    {
        a = a.ptr[1 .. a.length];
    }
    else
    {
        import core.bitop;
        auto msbs = 7 - bsr(~c);
        if ((msbs < 2) | (msbs > 6))
        {
            msbs = 1;
        }
        a = a[msbs .. $];
    }
}

With this code I get:

ascii 115.39%: old [744 ms, 103 μs, and 6 hnsecs], new [858 ms, 628 μs, and 4 hnsecs] uni 96.78%: old [1 sec, 877 ms, and 461 μs], new [1 sec, 817 ms, 14 μs, and 3 hnsecs]


Andrei

Reply via email to