Re: [Mpiblast-users] blast in 1 day but could not get mpiblast done even in 10 days for the same dataset

Aaron Darling Tue, 20 Feb 2007 16:09:26 -0800

Hi Intikhab,

intikhab alam wrote:
> Hi everybody,
>
> I have a dataset of 347899 protein sequences which I want to compare 
> to each other (all-against-all blast). I have access to the compute 
> cluster which is running Score (version 5.8.4.r3) as an mpi 
> environment and have 25 nodes, each with 4 cores and 8 GB of RAM.
>
> We have the latest version of mpiblast installed. I started a mpiblast 
> job, (for comparing 347899 sequences against each other), on 44 
> processors using the following commandline:
>
>
> mpiformatdb -i 36FungalJGIanigNbcin_M40 --nfrags=42 -p 
> T --skip-reorder
>
> mpisub 44 /usr/local/mpiblast_tool/bin/mpiblast -p blastp -d 
> 36FungalJGIanigNbcin_M40 -i 
> /users/zzalssn4/scratch/mpiblast/work/36FungalJGIanigNbcin_M40 -m 8 -e 
> 1e-5 -o 
> /users/zzalssn4/scratch/mpiblast/work/36FungalJGIanigNbcin_M40.outF42C44
>
> This job was running for about 12 days and only 22% or 10122202 
> matches of the total 47342483 known significant matches were received, 
> still all the processes running to the full (>90% usage) on all 
> specified processors.
>
>
> The same all-against-all blast job using standard blast on 36 
> processors, where I made 36 chunks of the dataset and blasted each 
> chunk against the complete dataset on a single processor, got 
> completed in less than 24hrs, resulting in 47342483 significant 
> sequence matches.
>
> May be I am missing something in running mpiblast properly, so here I 
> need some help in whether I could improve the running time of mpiblast 
> on the size of the datasets mentioned above.
>
> Hope to hear from you soon.
>
> Regards,
>
> Intikhab
>   
I can think of two reasons why mpiBLAST may be suffering on this compute 
job.  I'll start with the less-likely problem first: If the database and 
query sets contain amino acid sequences and are large, mpiblast 1.4.0 
can take a long time to compute the effective search space required for 
exact e-value calculation.  If that's the problem, then you would find 
just one mpiblast process consuming 100% cpu on the rank 0 node for 
hours or days, without any output.  The trouble is that 1.4.0 doesn't 
parallelize the effective search space calculation.  I've cobbled a 
workaround for this issue, which may be satisfactory if you can tolerate 
some discrepancy in e-values between mpiblast and NCBI blast.  See this 
e-mail: 
http://www.mail-archive.com/[email protected]/msg00177.html


The more likely limiting factor is load imbalance on the cluster. 
If some database fragments happen to have a large number of hits and 
others have few, and the database is distributed as one fragment per 
node, then the computation may be heavily imbalanced and may run quite 
slowly.  CPU consumption as given by a CPU monitoring tool may not be 
indicative of useful work being done on the nodes since workers can do a 
timed spin-wait for new work.
I can suggest two avenues to achieve better load balance with mpiblast 
1.4.0.  First, partition the database into more fragments, possibly two 
or three times as many as you currently have.  Second, use the 
--db-replicate-count option to mpiblast.  The default value for the 
db-replicate-count is 1, which indicates that mpiblast will distribute a 
single copy of your database across worker nodes.  For your setup, each 
node was probably getting a single fragment.  By setting 
--db-replicate-count to something like 5, each fragment would be copied  
to five different compute nodes, and thus five nodes would be available 
to search fragments that happen to have lots of hits.  In the extreme 
case you could set --db-replicate-count equal to the number of 
fragments, which would be fine if per-node memory and disk space is 
substantially larger than the total size of the formatted database.

I just noticed that our documentation at mpiblast.lanl.gov doesn't 
include mention of the --db-replicate-count parameter.  I'm fairly 
certain it had been documented but lost when the mpiblast.org web server 
crashed and burned.  In any case, that command-line parameter allows you 
to control the degree of redundancy and load balancing that mpiblast 
will use on your cluster.

In your particular situation, it may also help to randomize the order of 
sequences in the database to minimize "fragment hotspots" which could 
result from a database self-search.  At the moment mpiblast doesn't have 
code to accomplish such a feat, but I think others (Jason Gans?) have 
written code for this in the past.

-Aaron


-------------------------------------------------------------------------
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT & business topics through brief surveys-and earn cash
http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV
_______________________________________________
Mpiblast-users mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/mpiblast-users

Re: [Mpiblast-users] blast in 1 day but could not get mpiblast done even in 10 days for the same dataset

Reply via email to