On Monday, 14 September 2015 at 12:30:21 UTC, Fredrik Boulund wrote:
Hi,

This is my first post on Dlang forums and I don't have a lot of experience with D (yet). I mainly code bioinformatics-stuff in Python on my day-to-day job, but I've been toying with D for a couple of years now. I had this idea that it'd be fun to write a parser for a text-based tabular data format I tend to read a lot of in my programs, but I was a bit stomped that the D implementation I created was slower than my Python-version. I tried running `dmd -profile` on it but didn't really understand what I can do to make it go faster. I guess there's some unnecessary dynamic array extensions being made but I can't figure out how to do without them, maybe someone can help me out? I tried making the examples as small as possible.

Here's the code D code: http://dpaste.com/2HP0ZVA
Here's my Python code for comparison: http://dpaste.com/0MPBK67

clip

I am going to go off the beaten path here. If you really want speed
for a file like this one way of getting that is to read the file
in as a single large binary array of ubytes (or in blocks if its too big) and parse the lines yourself. Should be fairly easy with D's array slicing.

I looked at the format and it appears that lines are quite simple and use a limited subset of the ASCII chars. If that is in fact true then you should be able to speed up reading using this technique. If you can have UTF8 chars in there, or if the format can be more complex than that shown
in your example, then please ignore my suggestion.

Reply via email to