On Friday, 27 May 2016 at 13:29:00 UTC, Andrei Alexandrescu wrote:
On 5/27/16 8:28 AM, Chris wrote:

This is surprising. It should be easy to construct examples where brute force search performs arbitrarily poor, e.g. searching for "aaaaaaa...aab" in a long string with only "a"s.

I will look into this. Atm, I'm using qznc's benchmark.

NB: I've found that `foreach` is usually faster than a manual `for`-loop.

That's surprising too. I'd be interested to look at a disassembly if you have one.

I have to correct myself. A manual loop is faster, I couldn't believe it either, so I benchmarked the same function with `foreach` and `for`. Turns out that `for` is _way_ faster.

I used qznc's benchmarking, and it's clear that the current
implementation of std.algorithm.find is quite a poor performer. We do have to work on that. Library functions should be fast, no matter what. If someone uses `find`, they wanna be sure it's optimized for speed.

That's definitely the way to go. Thanks for looking into it!


Andrei

There is a bug, if it is one, in qznc's `manual_find`. It doesn't find strings at the end of a string, as "ing" in "string", and it does not find a string that is equal to the search string as in "abba" == "abba".

This outperforms both `manual_find` and the improved `std find`

string findStringS_Manual(string haystack, string needle)
{
        
        if (needle.length > haystack.length)
                return haystack[$..$];
        outer:
        for (auto i = 0; i < haystack.length; i++)
        {
                if (haystack[i] != needle[0])
                        continue;
                for (size_t j = i+1, k = 1; k < needle.length; ++j, ++k)
                        if (haystack[j] != needle[k])
                        continue outer;
                return haystack[i..$];
        }
        return haystack[$..$];
}

A typical example would be:

std find        took     25642297
manual find     took      7426676  // manual_find
my std find     took      6654496  // improved find
findStringS     took      7159556  // foreach(i, c; haystack)
findStringS_ took 5315498 // for (auto i = 0 ...) see above

I think it's clear that `std find` is veeeery slow. If anyone wants to test my function, please do so. I don't want to spread wrong data, but this is what I get at the moment (ldc latest version).

Reply via email to