Re: How to escape strings for re.finditer?

Thomas Passin Tue, 28 Feb 2023 07:54:15 -0800

On 2/28/2023 10:05 AM, Roel Schroeven wrote:

Op 28/02/2023 om 14:35 schreef Thomas Passin:
On 2/28/2023 4:33 AM, Roel Schroeven wrote:
[...]
(2) Searching for a string in another string, in a performant way, isnot as simple as it first appears. Your version works correctly, butslowly. In some situations it doesn't matter, but in other cases itwill. For better performance, string searching algorithms jump aheadeither when they found a match or when they know for sure there isn'ta match for some time (see e.g. the Boyer–Moore string-searchalgorithm). You could write such a more efficient algorithm, but thenit becomes more complex and more error-prone. Using a well-testedexisting function becomes quite attractive.
Sure, it all depends on what the real task will be. That's why Iwrote "Without knowing how general your expressions will be". For theexample string, it's unlikely that speed will be a factor, but whoknows what target strings and keys will turn up in the future?
On hindsight I think it was overthinking things a bit. "It all dependson what the real task will be" you say, and indeed I think that shouldbe the main conclusion here.

It is interesting, though, how pre-processing the search pattern canimprove search times if you can afford the pre-processing. Here's apaper on rapidly finding matches when there may be up to one misspelledcharacter. It's easy enough to implement, though in Python you can'ttake the additional step of tuning it to stay in cache.


https://Robert.Muth.Org/Papers/1996-Approx-Multi.Pdf

--
https://mail.python.org/mailman/listinfo/python-list

Re: How to escape strings for re.finditer?

Reply via email to