Antoine Pitrou <[email protected]> added the comment: > There are occasions when you want to do string slicing, often of the form: > > pos = my_str.index(x) > endpos = my_str.index(y) > substring = my_str[pos : endpos] > > To me that suggests that if UTF-8 is used then it may be worth > profiling to see whether caching the last 2 positions would be > beneficial.
And/or a lookup table giving the byte offset of, say, every 16th character. It gives you a O(1) lookup with a relatively reasonable constant cost (you have to scan for less than 16 characters after the lookup). On small strings (< 256 UTF-8 bytes) the space overhead for the lookup table would be 1/16. It could also be constructed lazily whenever more than 2 positions are cached. ---------- _______________________________________ Python tracker <[email protected]> <http://bugs.python.org/issue12729> _______________________________________ _______________________________________________ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
