New submission from Serhiy Storchaka:

Search in strings is highly optimized for common case. However for some input 
data the search in non-ascii string becomes unexpectedly slow. Compare:

$ ./python -m timeit -s 's = "АБВГД"*10**4' -- '"є" in s'
100000 loops, best of 3: 11.7 usec per loop
$ ./python -m timeit -s 's = "АБВГД"*10**4' -- '"Є" in s'
1000 loops, best of 3: 769 usec per loop

It's because the lowest byte of the code of Ukrainian capital letter Є (U+0404) 
matches the highest byte of codes of most Cyrillic letters (U+04xx). There are 
similar issues with some other scripts.

I think we should use more robust optimization.

----------
assignee: serhiy.storchaka
components: Interpreter Core
messages: 248179
nosy: haypo, pitrou, serhiy.storchaka
priority: low
severity: normal
stage: needs patch
status: open
title: The optimization of string search can cause pessimization
type: performance
versions: Python 3.6

_______________________________________
Python tracker <rep...@bugs.python.org>
<http://bugs.python.org/issue24821>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

Reply via email to