On 11/27/2010 11:53 AM, bearophile wrote:
While translating a common Python script to D, I have found something 
interesting, so I have reduced it to a little benchmark.

Below there is the reduced Python2 code (it uses Psyco), and a little program 
to generate some test data. The timing of the first D2 version is not good 
compared to the Python-Psyco program (the generation of the *300 array is a 
quick thing), so I have created two more D2 versions to show myself that D2 
wasn't broken :-)

The reduced code looks like a syntetic benchmark, but it has about the same 
performance profile of a 60 lines long Python script (the original code was 
using xrange(0,len(...)-3,3)  instead of xrange(len(...)-3), but the situation 
doesn't change much).


It's not clear to me if the point of your post is "how do I make this go faster?" or "Waa! D underperforms Python by default".

If it is the former, a state machine is probably the right answer for long input strings.

start -> E

E A,C,G:-> E
E T:-> T

T A:-> TA
T C:-> E
T G:-> TG
T T:-> T

TA A,G:-> E, {++count}
TA C:-> E
TA T:-> T

TG A:-> E, {++count}
TG C,G:-> E
TG T:-> T


Since the probability of failure is about .75, 0.5, {.5 or .75}, the overall comparison cost is about 1 + .25 + .125 = 1.375, times compare+loop times n, plus all the function call overhead, etc.

Using a DFA should reduce the cost to 1 * index, times n, assuming you are using a small alphabet (I'm guessing that your codons are ascii+uppercase only, table-size = 26).

=Austin

Reply via email to