On 11/28/2010 04:48 PM, bearophile wrote:
Pelle Månsson:

I don't have your data set,

In my first post I have explained how to create the data set, using a little 
Python script that I have listed there. You just need a Python2 interpreter.


but for me using random data this was within a factor 2 of your #3, without any 
fiddly code.

Thank you for your code. I have added your version, plus a modified version 
that works on ubytes, because it's faster, #9 and #10:

-------------------------

// D version #9
import std.file: read;
import std.c.stdio: printf;
import std.algorithm: find;
import std.array: empty, popFront;

int test(char[] data) {
     int count;
     while (true) {
         data = data.find("TAG", "TGA", "TAA")[0];
         if (data.empty)
             return count;
          count++;
          data.popFront();
      }
}

void main() {
     char[] data0 = cast(char[])read("data.txt");
     int n = 300;
     char[] data = new char[data0.length * n];
     for (size_t pos; pos<  data.length; pos += data0.length)
         data[pos .. pos + data0.length] = data0;

     printf("%d\n", test(data));
}

-------------------------

// D version #10
import std.file: read;
import std.c.stdio: printf;
import std.algorithm: find;
import std.array: empty, popFront;

int test(ubyte[] data) {
     int count;
     while (true) {
         data = data.find("TAG", "TGA", "TAA")[0];
         if (data.empty)
             return count;
          count++;
          data.popFront();
      }
}

void main() {
     ubyte[] data0 = cast(ubyte[])read("data.txt");
     int n = 300;
     ubyte[] data = new ubyte[data0.length * n];
     for (size_t pos; pos<  data.length; pos += data0.length)
         data[pos .. pos + data0.length] = data0;

     printf("%d\n", test(data));
}

-------------------------

Timings, dmd compiler, best of 4, seconds:
   D #1:  5.72
   D #9:  5.04
   D #10: 3.31
   D #4:  1.84
   D #5:  1.73
   Psy:   1.59
   D #8:  1.51
   D #7:  0.56
   D #2:  0.55
   D #6:  0.47
   D #3:  0.34


So on this PC it's not much fast. And generally I don't like to use find, empty 
and popFront to solve this problem, it's quite unnatural. Here I have explained 
what I think is a good enough solution to this performance problem:
http://www.digitalmars.com/webnews/newsgroups.php?art_group=digitalmars.D&article_id=123044

Bye,
bearophile

Measuring incorrectly in a performance test is a silly mistake, I was indeed wrong about the factor two thing!

I thought find was specialized for chars so it wouldn't require casting to ubyte. Maybe I was wrong again.

Thank you for your engaging performance oriented posts. :-)

Reply via email to