Dennis Sweeney <sweeney.dennis...@gmail.com> added the comment:

I agree that skip could could do 1 better.

---

> I don't know whether it "should be" applied

I don't think I'm convinced: the second check fixes only the very specific case 
when s[len(p):].startswith(p).
Perturbations of reproducer.py are still very slow with the patch:

    - pattern2 = b"B" * BIG
    + pattern2 = b"B" * (BIG + 10)

In fact, it's even slower than before due to the double-checking.

---

I'd be curious how something like Crochemore and Perrin's "two-way" algorithm 
would do (constant space, compares < 2n-m):

    http://www-igm.univ-mlv.fr/~lecroq/string/node26.html#SECTION00260
    https://en.wikipedia.org/wiki/Two-way_string-matching_algorithm

Apparently it's been used as glibc's memmem() since 2008. There, it 
additionally computes a table, but only for long needles where it really pays 
off:

    https://code.woboq.org/userspace/glibc/string/str-two-way.h.html
    https://code.woboq.org/userspace/glibc/string/memmem.c.html

Although there certainly seems to be a complexity rabbithole here that would be 
easy to over-do.
I might look more into this.

----------

_______________________________________
Python tracker <rep...@bugs.python.org>
<https://bugs.python.org/issue41972>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

Reply via email to