Re: Table lookups - this is pretty definitive

John Colvin Tue, 01 Apr 2014 12:27:26 -0700

On Tuesday, 1 April 2014 at 18:35:50 UTC, Walter Bright wrote:

Try this benchmark comparing various classification schemes:
---------------------------------
import core.stdc.stdlib;
import core.stdc.string;


import std.algorithm;
import std.array;
import std.ascii;
import std.datetime;
import std.range;
import std.stdio;
import std.traits;

bool isIdentifierChar0(ubyte c)
{
    return isAlphaNum(c) || c == '_' || c == '$';
}

bool isIdentifierChar1(ubyte c)
{
    return ((c >= '0' || c == '$') &&
            (c <= '9' || c >= 'A')  &&
            (c <= 'Z' || c >= 'a' || c == '_') &&
            (c <= 'z'));
}

immutable bool[256] tab2;
static this()
{
    for (size_t u = 0; u < 0x100; ++u)
    {
        tab2[u] = isIdentifierChar0(cast(ubyte)u);
    }
}

bool isIdentifierChar2(ubyte c)
{
    return tab2[c];
}

/*********************************************/

int f0()
{
    int x;
    for (uint u = 0; u < 0x100; ++u)
    {
        x += isIdentifierChar0(cast(ubyte)u);
    }
    return x;
}

int f1()
{
    int x;
    for (uint u = 0; u < 0x100; ++u)
    {
        x += isIdentifierChar1(cast(ubyte)u);
    }
    return x;
}

int f2()
{
    int x;
    for (uint u = 0; u < 0x100; ++u)
    {
        x += isIdentifierChar2(cast(ubyte)u);
    }
    return x;
}

void main()
{
    auto r = benchmark!(f0, f1, f2)(10_000);

writefln("Milliseconds %s %s %s", r[0].msecs, r[1].msecs,r[2].msecs);

It's not really possible to tell anything from that benchmark,especially with fancy modern optimisers and branch predictors.

1) ldc and gdc both optimise away some/all of the testsrespectively.2) once isIdentifierCharX is inlined, the compiler can determinewhat the results will be before-hand.3) Even without 1) and 2), the results are unrealistically(very!) correlated, leading to a branch-prediction bonanza. I'msure you know how significant that can be on modern CPUs. It isalso very friendly to pre-fetching of the lookup table*

Perhaps have the same benchmark, but working on realistic datafrom a file.

*In my experience, the cost of using lookup tables often onlyappears in real code, where cache pressure and predictabilitybecomes worse. This is especially true of lazy, range-based code.If several layers of a range use different lookup tables you canquickly end up ruining cache performance compared to an eagerapproach.

Re: Table lookups - this is pretty definitive

Reply via email to