On Monday, 5 August 2013 at 15:18:42 UTC, John Colvin wrote:

better:

foreach (...)
{
    auto tmp = std.string.tolower(fext[0]);
    if(tmp == "doc" || tmp == "docx"
       || tmp == "xls" || tmp == "xlsx"
       || tmp == "ppt" || tmp == "pptx")
    {
        continue;
    }
}

but still not super-fast as (unless the compiler is very clever) it still means multiple passes over tmp. Also, it converts the whole string to lower case even when it's not necessary.

If you have large numbers of possible matches you will probably want to be clever with your data structures / algorithms. E.g.

You could create a tree-like structure to quickly eliminate possibilities as you read successive letters. You read one character, follow the appropriate branch, check if there are any further branches, if not then no match and break. Else, read the next character and follow the appropriate branch and so on.... Infeasible for large (or even medium-sized) character-sets without hashing, but might be pretty fast for a-z and a large number of short strings.

Arguably, you'd do even better with:
foreach (...)
{
    auto tmp = std.string.tolower(fext[0]);
    switch(tmp)
    {
        case "doc", "docx":
        case "xls", "xlsx":
        case "ppt", "pptx":
            continue;
        default:
    }
}

Since it gives the compiler more wiggle room to optimize things as it wishes. For example, it *could* (who knows :D !) implement the switch as a hash table, or a tree.

BTW, a very convenient "tree-like" structure that could be used is a "heap": it is a basic binary tree, but stored inside an array. You could build it during compilation, and then simply search it.

A possible optimization is to first switch on string length:

foreach (...)
{
    auto tmp = std.string.tolower(fext[0]);
    switch(tmp.length)
    {
        case 3:
        switch(tmp)
        {
            case "doc", "xls", "ppt":
                continue;
            default:
        }
        break;

        case 4:
        switch(tmp)
        {
            case "docx", "xlsx", "pptx":
                continue;
            default:
        }
        break;

        default:
    }
}

That said, I'm not even sure this would be faster, so a bench would be called for. Further more, I'd really be tempted to say that at this point, we are in the realm of premature optimization.

Reply via email to