New submission from Serhiy Storchaka: Search in strings is highly optimized for common case. However for some input data the search in non-ascii string becomes unexpectedly slow. Compare:
$ ./python -m timeit -s 's = "АБВГД"*10**4' -- '"є" in s' 100000 loops, best of 3: 11.7 usec per loop $ ./python -m timeit -s 's = "АБВГД"*10**4' -- '"Є" in s' 1000 loops, best of 3: 769 usec per loop It's because the lowest byte of the code of Ukrainian capital letter Є (U+0404) matches the highest byte of codes of most Cyrillic letters (U+04xx). There are similar issues with some other scripts. I think we should use more robust optimization. ---------- assignee: serhiy.storchaka components: Interpreter Core messages: 248179 nosy: haypo, pitrou, serhiy.storchaka priority: low severity: normal stage: needs patch status: open title: The optimization of string search can cause pessimization type: performance versions: Python 3.6 _______________________________________ Python tracker <rep...@bugs.python.org> <http://bugs.python.org/issue24821> _______________________________________ _______________________________________________ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com