New submission from Arnim Rupp <er...@rupp.de>:
The re lib needs 7 seconds to check if a billion As start with an x. So e.g. this statement takes this long: re.search(r'^x', 'A' * 1000000000) It takes longer, the longer the string is. The string handling is not the problem, checking if it starts which an A takes just 0.00014 seconds. See output and code below: 3.10.0a4+ (heads/master:d16f617, Jan 9 2021, 13:24:45) [GCC 7.5.0] testing string len: 100000 re_test_false: 0.0008246829966083169 testing string len: 1000000000 re_test_false: 7.317708015005337 testing string len: 1000000000 re_test_true: 0.00014710200048284605 import re, timeit, functools, sys def re_test_true(string): print("testing string len: ", len(string)) re.search(r'^A', string) def re_test_false(string): print("testing string len: ", len(string)) re.search(r'^x', string) print(sys.version) huge_string = 'A' * 100000 print('re_test_false: ', timeit.timeit(functools.partial(re_test_false, huge_string), number=1)) huge_string = 'A' * 1000000000 print('re_test_false: ', timeit.timeit(functools.partial(re_test_false, huge_string), number=1)) print('re_test_true: ', timeit.timeit(functools.partial(re_test_true, huge_string), number=1)) ---------- components: Library (Lib) files: regex_timeit.py messages: 384782 nosy: another_try priority: normal severity: normal status: open title: Regex performance problem with ^ aka AT_BEGINNING type: performance versions: Python 3.10 Added file: https://bugs.python.org/file49733/regex_timeit.py _______________________________________ Python tracker <rep...@bugs.python.org> <https://bugs.python.org/issue42885> _______________________________________ _______________________________________________ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com