On 2023-02-28, Thomas Passin <li...@tompassin.net> wrote: > On 2/28/2023 10:05 AM, Roel Schroeven wrote: >> Op 28/02/2023 om 14:35 schreef Thomas Passin: >>> On 2/28/2023 4:33 AM, Roel Schroeven wrote: >>>> [...] >>>> (2) Searching for a string in another string, in a performant way, is >>>> not as simple as it first appears. Your version works correctly, but >>>> slowly. In some situations it doesn't matter, but in other cases it >>>> will. For better performance, string searching algorithms jump ahead >>>> either when they found a match or when they know for sure there isn't >>>> a match for some time (see e.g. the Boyer–Moore string-search >>>> algorithm). You could write such a more efficient algorithm, but then >>>> it becomes more complex and more error-prone. Using a well-tested >>>> existing function becomes quite attractive. >>> >>> Sure, it all depends on what the real task will be. That's why I >>> wrote "Without knowing how general your expressions will be". For the >>> example string, it's unlikely that speed will be a factor, but who >>> knows what target strings and keys will turn up in the future? >> On hindsight I think it was overthinking things a bit. "It all depends >> on what the real task will be" you say, and indeed I think that should >> be the main conclusion here. > > It is interesting, though, how pre-processing the search pattern can > improve search times if you can afford the pre-processing. Here's a > paper on rapidly finding matches when there may be up to one misspelled > character. It's easy enough to implement, though in Python you can't > take the additional step of tuning it to stay in cache. > > https://Robert.Muth.Org/Papers/1996-Approx-Multi.Pdf
You've somehow title-cased that URL. The correct URL is: https://robert.muth.org/Papers/1996-approx-multi.pdf -- https://mail.python.org/mailman/listinfo/python-list