On 2/28/2023 10:05 AM, Roel Schroeven wrote:
Op 28/02/2023 om 14:35 schreef Thomas Passin:
On 2/28/2023 4:33 AM, Roel Schroeven wrote:
[...]
(2) Searching for a string in another string, in a performant way, is not as simple as it first appears. Your version works correctly, but slowly. In some situations it doesn't matter, but in other cases it will. For better performance, string searching algorithms jump ahead either when they found a match or when they know for sure there isn't a match for some time (see e.g. the Boyer–Moore string-search algorithm). You could write such a more efficient algorithm, but then it becomes more complex and more error-prone. Using a well-tested existing function becomes quite attractive.

Sure, it all depends on what the real task will be.  That's why I wrote "Without knowing how general your expressions will be". For the example string, it's unlikely that speed will be a factor, but who knows what target strings and keys will turn up in the future?
On hindsight I think it was overthinking things a bit. "It all depends on what the real task will be" you say, and indeed I think that should be the main conclusion here.


It is interesting, though, how pre-processing the search pattern can improve search times if you can afford the pre-processing. Here's a paper on rapidly finding matches when there may be up to one misspelled character. It's easy enough to implement, though in Python you can't take the additional step of tuning it to stay in cache.

https://Robert.Muth.Org/Papers/1996-Approx-Multi.Pdf

--
https://mail.python.org/mailman/listinfo/python-list

Reply via email to