Dennis Sweeney <sweeney.dennis...@gmail.com> added the comment:
I agree that skip could could do 1 better. --- > I don't know whether it "should be" applied I don't think I'm convinced: the second check fixes only the very specific case when s[len(p):].startswith(p). Perturbations of reproducer.py are still very slow with the patch: - pattern2 = b"B" * BIG + pattern2 = b"B" * (BIG + 10) In fact, it's even slower than before due to the double-checking. --- I'd be curious how something like Crochemore and Perrin's "two-way" algorithm would do (constant space, compares < 2n-m): http://www-igm.univ-mlv.fr/~lecroq/string/node26.html#SECTION00260 https://en.wikipedia.org/wiki/Two-way_string-matching_algorithm Apparently it's been used as glibc's memmem() since 2008. There, it additionally computes a table, but only for long needles where it really pays off: https://code.woboq.org/userspace/glibc/string/str-two-way.h.html https://code.woboq.org/userspace/glibc/string/memmem.c.html Although there certainly seems to be a complexity rabbithole here that would be easy to over-do. I might look more into this. ---------- _______________________________________ Python tracker <rep...@bugs.python.org> <https://bugs.python.org/issue41972> _______________________________________ _______________________________________________ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com