Dennis Sweeney <[email protected]> added the comment:
Another algorithmic possibility: Instead of the bitset, we could have a
stack-allocated
uint8_t jump[32]; // maybe 64? Maybe uint16_t?
It would say this: If the last character lined up in the haystack is congruent
to i mod (1 << 8), then jump ahead by (neede_len if jump[i]==255 else jump[i]),
where jump[i] gives the distance between the end of the needle and the last
occurrence in the needle of a character congruent to i.
Is this sort of constant-but-more-than-a-few-bytes stack-space an acceptable
constant memory cost? If so, I believe that could be a big improvement.
There are also a bunch of little tweaks that could be done here: For example,
should a hypothetical jump-table jump to line up the last character in the
needle, or jump to line up the middle character in the needle? The max from two
tables? Should we search for the last characters to be equal, or just make sure
the last character is in the needle-bit-set (like in the PR)? Should there be a
second bit-set for the right half of the string? Should there be a skip value
computed for the last character as well as the middle character (middle
character only in the PR)? etc. etc. I'll be sure to try some of these things.
----------
_______________________________________
Python tracker <[email protected]>
<https://bugs.python.org/issue41972>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe:
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com