On 11/27/2010 11:53 AM, bearophile wrote:
While translating a common Python script to D, I have found something
interesting, so I have reduced it to a little benchmark.
Below there is the reduced Python2 code (it uses Psyco), and a little program
to generate some test data. The timing of the first D2 version is not good
compared to the Python-Psyco program (the generation of the *300 array is a
quick thing), so I have created two more D2 versions to show myself that D2
wasn't broken :-)
The reduced code looks like a syntetic benchmark, but it has about the same
performance profile of a 60 lines long Python script (the original code was
using xrange(0,len(...)-3,3) instead of xrange(len(...)-3), but the situation
doesn't change much).
It's not clear to me if the point of your post is "how do I make this go
faster?" or "Waa! D underperforms Python by default".
If it is the former, a state machine is probably the right answer for
long input strings.
start -> E
E A,C,G:-> E
E T:-> T
T A:-> TA
T C:-> E
T G:-> TG
T T:-> T
TA A,G:-> E, {++count}
TA C:-> E
TA T:-> T
TG A:-> E, {++count}
TG C,G:-> E
TG T:-> T
Since the probability of failure is about .75, 0.5, {.5 or .75}, the
overall comparison cost is about 1 + .25 + .125 = 1.375, times
compare+loop times n, plus all the function call overhead, etc.
Using a DFA should reduce the cost to 1 * index, times n, assuming you
are using a small alphabet (I'm guessing that your codons are
ascii+uppercase only, table-size = 26).
=Austin