Re: Speeding up text file parser (BLAST tabular format)

CraigDillabaugh via Digitalmars-d-learn Mon, 14 Sep 2015 10:56:21 -0700

On Monday, 14 September 2015 at 12:30:21 UTC, Fredrik Boulundwrote:

Hi,
This is my first post on Dlang forums and I don't have a lot ofexperience with D (yet). I mainly code bioinformatics-stuff inPython on my day-to-day job, but I've been toying with D for acouple of years now. I had this idea that it'd be fun to writea parser for a text-based tabular data format I tend to read alot of in my programs, but I was a bit stomped that the Dimplementation I created was slower than my Python-version. Itried running `dmd -profile` on it but didn't really understandwhat I can do to make it go faster. I guess there's someunnecessary dynamic array extensions being made but I can'tfigure out how to do without them, maybe someone can help meout? I tried making the examples as small as possible.
Here's the code D code: http://dpaste.com/2HP0ZVA
Here's my Python code for comparison: http://dpaste.com/0MPBK67

clip

I am going to go off the beaten path here. If you really wantspeed

for a file like this one way of getting that is to read the file

in as a single large binary array of ubytes (or in blocks if itstoo big)and parse the lines yourself. Should be fairly easy with D'sarray slicing.

I looked at the format and it appears that lines are quite simpleand usea limited subset of the ASCII chars. If that is in fact truethen youshould be able to speed up reading using this technique. If youcan haveUTF8 chars in there, or if the format can be more complex thanthat shown

in your example, then please ignore my suggestion.

Re: Speeding up text file parser (BLAST tabular format)

Reply via email to