Re: String compare performance

Austin Hastings Sat, 27 Nov 2010 22:15:22 -0800

On 11/27/2010 11:53 AM, bearophile wrote:

While translating a common Python script to D, I have found something 
interesting, so I have reduced it to a little benchmark.


Below there is the reduced Python2 code (it uses Psyco), and a little program 
to generate some test data. The timing of the first D2 version is not good 
compared to the Python-Psyco program (the generation of the *300 array is a 
quick thing), so I have created two more D2 versions to show myself that D2 
wasn't broken :-)

The reduced code looks like a syntetic benchmark, but it has about the same 
performance profile of a 60 lines long Python script (the original code was 
using xrange(0,len(...)-3,3)  instead of xrange(len(...)-3), but the situation 
doesn't change much).

It's not clear to me if the point of your post is "how do I make this gofaster?" or "Waa! D underperforms Python by default".

If it is the former, a state machine is probably the right answer forlong input strings.


start -> E

E A,C,G:-> E
E T:-> T

T A:-> TA
T C:-> E
T G:-> TG
T T:-> T

TA A,G:-> E, {++count}
TA C:-> E
TA T:-> T

TG A:-> E, {++count}
TG C,G:-> E
TG T:-> T

Since the probability of failure is about .75, 0.5, {.5 or .75}, theoverall comparison cost is about 1 + .25 + .125 = 1.375, timescompare+loop times n, plus all the function call overhead, etc.

Using a DFA should reduce the cost to 1 * index, times n, assuming youare using a small alphabet (I'm guessing that your codons areascii+uppercase only, table-size = 26).


=Austin

Re: String compare performance

Reply via email to