On Friday, 7 October 2022 at 12:19:59 UTC, bachmeier wrote:
https://www.cs.utexas.edu/users/moore/best-ideas/string-searching/
"the longer the pattern is, the faster the algorithm goes"
Yes, that's how substring search works in the standard libraries
of the other programming languages. Now please take the torhu's
test program (posted a few comments above) and generate input for
it using
python -c "print(('a' * 49 + 'b') * 20000)" > test.lst
Run it to search for
"aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa" (the
character 'a' replicated 50 times) and then compare its
performance against the other similar programs doing the same
thing:
Python:
```Python
import sys
assert len(sys.argv) >= 2, "Need a needle argument."
needle = sys.argv[1].lower()
print(sum([1 if line.lower().find(needle) != -1 else 0 for line
in open("test.lst")]))
```
Ruby/Crystal:
```Ruby
abort "Need a needle argument." unless ARGV.size >= 1
needle = ARGV[0].downcase
puts File.open("test.lst").each_line.sum {|line|
line.downcase.index(needle) ? 1 : 0 }
```
A generic substring search is tuned to be fast on any input in
the other programming languages. But in Phobos a simpleminded
slow search algorithm is used by default. The Boyer-Moore
algorithm can be used too if it's explicitly requested, but it
has some gotchas of its own.