On Friday, 7 October 2022 at 12:19:59 UTC, bachmeier wrote:
https://www.cs.utexas.edu/users/moore/best-ideas/string-searching/

"the longer the pattern is, the faster the algorithm goes"

Yes, that's how substring search works in the standard libraries of the other programming languages. Now please take the torhu's test program (posted a few comments above) and generate input for it using

    python -c "print(('a' * 49 + 'b') * 20000)" > test.lst

Run it to search for "aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa" (the character 'a' replicated 50 times) and then compare its performance against the other similar programs doing the same thing:

Python:

```Python
import sys
assert len(sys.argv) >= 2, "Need a needle argument."
needle = sys.argv[1].lower()
print(sum([1 if line.lower().find(needle) != -1 else 0 for line in open("test.lst")]))
```

Ruby/Crystal:

```Ruby
abort "Need a needle argument." unless ARGV.size >= 1
needle = ARGV[0].downcase
puts File.open("test.lst").each_line.sum {|line| line.downcase.index(needle) ? 1 : 0 }
```

A generic substring search is tuned to be fast on any input in the other programming languages. But in Phobos a simpleminded slow search algorithm is used by default. The Boyer-Moore algorithm can be used too if it's explicitly requested, but it has some gotchas of its own.

Reply via email to