Re: Speeding up text file parser (BLAST tabular format)

Fredrik Boulund via Digitalmars-d-learn Mon, 14 Sep 2015 07:15:50 -0700

On Monday, 14 September 2015 at 13:10:50 UTC, Edwin van Leeuwenwrote:

Two things that you could try:
First hitlists.byKey can be expensive (especially if hitlistsis big). Instead use:
foreach( key, value ; hitlists )
Also the filter.array.length is quite expensive. You could usecount instead.
import std.algorithm : count;
value.count!(h => h.pid >= (max_pid - max_pid_diff));

I didn't know that hitlists.byKey was that expensive, that's justthe kind of feedback I was hoping for. I'm just grasping forstraws in the online documentation when I want to do things. Withmy Python background it feels as if I can still get things thatwork that way.

I realize the filter.array.length thing is indeed expensive. Ifind it especially horrendous that the code I've written needs toallocate a big dynamic array that will most likely be cut downquite drastically in this step. Unfortunately I haven't figuredout a good way to do this without storing the intermediaryresults since I cannot know if there might be yet another hit forany encountered "query" since the input file might not be sorted.But the main reason I didn't just count the values like yousuggest is actually that I need the filtered hits in laterdownstream analysis. The filtered hits for each query are used asinput to a lowest common ancestor algorithm on the taxonomic tree(of life).

Re: Speeding up text file parser (BLAST tabular format)

Reply via email to