Re: Table lookups - this is pretty definitive

monarch_dodra Wed, 02 Apr 2014 04:57:14 -0700

On Tuesday, 1 April 2014 at 18:35:50 UTC, Walter Bright wrote:

Try this benchmark comparing various classification schemes:


bool isIdentifierChar1(ubyte c)
{
    return ((c >= '0' || c == '$') &&
            (c <= '9' || c >= 'A')  &&
            (c <= 'Z' || c >= 'a' || c == '_') &&
            (c <= 'z'));
}

I'd like to point out this is quite a complicated function tobegin with, so it doesn't generalize to all isXXX is ascii, forwhich the tests would be fairly simpler.

In any case, (on my win32 machine) I can go from 810msecs to500msecs using this function instead:


bool isIdentifierChar1(ubyte c)
{
    return c <= 'z' && (
            'a' <= c ||

('0' <= c && (c <= '9' || c == '_' || ('A' <= c && c<= 'Z'))) ||

            c == '$');
}

That said, I'm abusing the fact that 50% of your bench is forchars over 0x80. If I loop only on actual ASCII you can find intext, (0x20 - 0X80), then those numbers "only" go from "320" =>"300". Only slightly better, but still a win.

*BUT*, if your functions were to accept any arbitrary codepoint,it would absolutely murder.

Re: Table lookups - this is pretty definitive

Reply via email to