playing around with text search (strutils.find) I realized, that the nim find
implementation uses a Boyer-Moore algorithm. This is nice in many cases, but
there are scenarios where a naive brute force search is faster (small search
patterns and/or a simple one shot search like find("nim is a great programming
language", "great")). The situation gets worse, when a stupid user like me uses
the find repeatedly without precalculating the SkipTable. - I know stupid
programmers write stupid code.
Now I tried to find a better find algorithm. After some googling my first stop
was a Knuth-Morris-Pratt, but in all my benchmarks a simple brute force find
was better (Maybe someone can show me an artificial search string and pattern
where KMP shines? ). Than I had a genius idea and "invented" a rolling hash
search. After implementing it, I googled again and found out that I
re-"invented" a naive form of the Rabin-Karp algorithm :) But to my
astonishment my naive implementation works very well. In all my tests, it is
always better than the naive brute force search. In many cases it beats the BM
and when it looses it is not so bad. But I want to make some more tests to be
sure. Can someone show me some real world test where BM shines - my current
observation is the longer the pattern, the better is BM.
When it turns out like I guess, I would like to suggest the following for the
strutils find:
* find a cut off size for patterns where BM is likely to be better than RK
* for smaller patterns use the RK
* for longer patterns use the BM
what do you think ?