Re: RE Module Performance

Devyn Collier Johnson Mon, 15 Jul 2013 03:16:20 -0700


On 07/14/2013 02:17 PM, 88888 Dihedral wrote:

On Saturday, July 13, 2013 1:37:46 PM UTC+8, Steven D'Aprano wrote:

On Fri, 12 Jul 2013 13:58:29 -0400, Devyn Collier Johnson wrote:

I plan to spend some time optimizing the re.py module for Unix systems.
I would love to amp up my programs that use that module.



In my experience, often the best way to optimize a regex is to not use it

at all.



[steve@ando ~]$ python -m timeit -s "import re" \

-s "data = 'a'*100+'b'" \
"if re.search('b', data): pass"

100000 loops, best of 3: 2.77 usec per loop



[steve@ando ~]$ python -m timeit -s "data = 'a'*100+'b'" \

"if 'b' in data: pass"

1000000 loops, best of 3: 0.219 usec per loop



In Python, we often use plain string operations instead of regex-based

solutions for basic tasks. Regexes are a 10lb sledge hammer. Don't use

them for cracking peanuts.







--

Steven

OK, lets talk about the indexed search algorithms of
a character streamor strig which can be buffered and
indexed randomly for RW operations but faster in sequential
block RW operations after some pre-processing.

This was solved long time ago in the suffix array or
suffix tree part and summarized in the famous BWT paper in 199X.

Do we want volunteers to speed up
search operations in the string module in Python?

It would be nice if someone could speed it up.
--
http://mail.python.org/mailman/listinfo/python-list

Re: RE Module Performance

Reply via email to