On 7/4/12 10:11 PM, Jonathan M Davis wrote:
On Wednesday, July 04, 2012 21:33:31 Andrei Alexandrescu wrote:
Great. Could you please post some code so we play with it? Thanks.
Okay. You can use this:
[snip]
Thanks. I made the following change to popFront:
@trusted void popFront(A)(ref A a)
if (isNarrowString!A && isMutable!A && !isStaticArray!A)
{
assert(a.length, "Attempting to popFront() past the end of an array
of "
~ typeof(a[0]).stringof);
immutable c = a[0];
if (c < 0x80)
{
a = a.ptr[1 .. a.length];
}
else
{
import core.bitop;
immutable msbs = 7 - bsr(~c);
if ((msbs >= 2) & (msbs <= 6))
{
a = a[msbs .. $];
}
else
{
//throw new UTFException("Invalid UTF-8 sequence", 0);
}
}
}
For some reason, uncommenting the throwing code makes the function
significantly slower. That seems to be an issue with the compiler
because putting the throw in a function seems to restore speed.
With the above, I get on a Mac:
ascii 126.61%: old [682 ms, 479 μs, and 3 hnsecs], new [864 ms, 102 μs,
and 1 hnsec]
uni 86.76%: old [1 sec, 888 ms, 17 μs, and 8 hnsecs], new [1 sec, 638
ms, 76 μs, and 3 hnsecs]
So the ascii string handling became actually 27% faster whereas the uni
string handling is 13% slower.
It might be argued that checking for validity is not the metier of
popFront; only if you do try to use stuff (e.g. by calling front) should
one see exceptions. If popFront sees incorrect characters, it should
just skip them one at a time. Following that argument, the
implementation may be:
@trusted void popFront(A)(ref A a)
if (isNarrowString!A && isMutable!A && !isStaticArray!A)
{
assert(a.length, "Attempting to popFront() past the end of an array
of "
~ typeof(a[0]).stringof);
immutable c = a[0];
if (c < 0x80)
{
a = a.ptr[1 .. a.length];
}
else
{
import core.bitop;
auto msbs = 7 - bsr(~c);
if ((msbs < 2) | (msbs > 6))
{
msbs = 1;
}
a = a[msbs .. $];
}
}
With this code I get:
ascii 115.39%: old [744 ms, 103 μs, and 6 hnsecs], new [858 ms, 628 μs,
and 4 hnsecs]
uni 96.78%: old [1 sec, 877 ms, and 461 μs], new [1 sec, 817 ms, 14
μs, and 3 hnsecs]
Andrei