[issue26436] Add the regex-dna benchmark

Serhiy Storchaka Tue, 01 Mar 2016 02:02:46 -0800

Serhiy Storchaka added the comment:

I used the code from fasta and regex-dna tests almost without changes. I.e. one 
part create the data in standard FASTA format (with 60-character lines and 
headers), and other part parses this format. The code can be simple if generate 
and consume raw data.


As for the quality of the code, tested code is pretty simple and enough 
pythonic. Yes, using replace() is more idiomatic and faster, but we are testing 
regular expressions. bytes.translate() doesn't work with dict, and 
str.translate() is slower than replace() or re.sub().

The code for generating test data is not the kind of the code that should be 
used in tutorials. It is highly optimized code that uses different optimization 
tricks that could be hard to understand without comments. But nothing 
unpythonic. It can be simplified if avoid formatting the data in standard FASTA 
format.

> I would add another kind of question: is it stressing something useful that 
> isn't already stressed by the two other regex benchmarks we already have?

Yes, it is. The regex_v8 benchmark is 2x faster with regex than with re. But 
the regex_dna benchmark is 1.6x slower with regex than with re. Thus these 
tests are stressing different aspects of regular expressions.

It may be worth also to test regular expressions with unicode strings. I expect 
some difference with latest Python and earlier 3.x and 2.7. The question is how 
to do this? Add a special option to switch between bytes and unicode (as 
--force_bytes in regex_effbot), or just run tests for bytes and unicode 
sequentially and add results?

----------

_______________________________________
Python tracker <rep...@bugs.python.org>
<http://bugs.python.org/issue26436>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue26436] Add the regex-dna benchmark

Reply via email to