On Monday, 14 September 2015 at 12:30:21 UTC, Fredrik Boulund
wrote:
Hi,
This is my first post on Dlang forums and I don't have a lot of
experience with D (yet). I mainly code bioinformatics-stuff in
Python on my day-to-day job, but I've been toying with D for a
couple of years now. I had this idea that it'd be fun to write
a parser for a text-based tabular data format I tend to read a
lot of in my programs, but I was a bit stomped that the D
implementation I created was slower than my Python-version. I
tried running `dmd -profile` on it but didn't really understand
what I can do to make it go faster. I guess there's some
unnecessary dynamic array extensions being made but I can't
figure out how to do without them, maybe someone can help me
out? I tried making the examples as small as possible.
Here's the code D code: http://dpaste.com/2HP0ZVA
Here's my Python code for comparison: http://dpaste.com/0MPBK67
clip
I am going to go off the beaten path here. If you really want
speed
for a file like this one way of getting that is to read the file
in as a single large binary array of ubytes (or in blocks if its
too big)
and parse the lines yourself. Should be fairly easy with D's
array slicing.
I looked at the format and it appears that lines are quite simple
and use
a limited subset of the ASCII chars. If that is in fact true
then you
should be able to speed up reading using this technique. If you
can have
UTF8 chars in there, or if the format can be more complex than
that shown
in your example, then please ignore my suggestion.