Re: [Mpiblast-users] blast in 1 day but could not get mpiblast done even in 10 days for the same dataset

intikhab alam Tue, 20 Feb 2007 17:02:39 -0800

Hi Aaron,

Thanks for your prompt response.

----- Original Message ----- 
From: "Aaron Darling" <[EMAIL PROTECTED]>
To: "intikhab alam" <[EMAIL PROTECTED]>; 
<[email protected]>
Sent: Wednesday, February 21, 2007 12:07 AM
Subject: Re: [Mpiblast-users] blast in 1 day but could not get 
mpiblast done even in 10 days for the same dataset

: I can think of two reasons why mpiBLAST may be suffering on this 
compute
: job.  I'll start with the less-likely problem first: If the database 
and
: query sets contain amino acid sequences and are large, mpiblast 
1.4.0
: can take a long time to compute the effective search space required 
for
: exact e-value calculation.  If that's the problem, then you would 
find
: just one mpiblast process consuming 100% cpu on the rank 0 node for
: hours or days, without any output.

Is the effective search space calculation done on the master node? If 
yes, this mpiblast job stayed at the master node for some hours and 
then all the compute nodes got busy with >90% usage all the time with 
continued output being generated until the 12th day when I killed the 
job.

: The trouble is that 1.4.0 doesn't
: parallelize the effective search space calculation.  I've cobbled a
: workaround for this issue, which may be satisfactory if you can 
tolerate
: some discrepancy in e-values between mpiblast and NCBI blast.  See 
this
: e-mail:
: 
http://www.mail-archive.com/[email protected]/msg00177.html
:
: The more likely limiting factor is load imbalance on the cluster.

In this case, do you think the job should finish on some nodes earliar 
than others? In my case job was running on all the nodes with >90% 
usage and the last output I got was on the last day when I killed the 
job.

: If some database fragments happen to have a large number of hits and
: others have few, and the database is distributed as one fragment per
: node, then the computation may be heavily imbalanced and may run 
quite
: slowly.  CPU consumption as given by a CPU monitoring tool may not 
be
: indicative of useful work being done on the nodes since workers can 
do a
: timed spin-wait for new work.
: I can suggest two avenues to achieve better load balance with 
mpiblast
: 1.4.0.  First, partition the database into more fragments, possibly 
two
: or three times as many as you currently have.  Second, use the

You mean more fragments that inturn means to use more nodes? Actually 
at our cluster not more than 44 nodes are allowed for the parallel 
jobs.

: --db-replicate-count option to mpiblast.  The default value for the
: db-replicate-count is 1, which indicates that mpiblast will 
distribute a
: single copy of your database across worker nodes.  For your setup, 
each
: node was probably getting a single fragment.  By setting

Is it not right if each single node gets a single fragment of the 
target database (the number of nodes assigned for mpiblast = number of 
fragments+2) so that the whole query dataset could be searched against 
the fragment (effective search space calculation being done before 
starting the search for blast comparable evalues) on each single node?

: --db-replicate-count to something like 5, each fragment would be 
copied
: to five different compute nodes, and thus five nodes would be 
available
: to search fragments that happen to have lots of hits.  In the 
extreme

You mean this way nodes would be busy searching the query dataset 
against the same fragment on 5 compute nodes? Is this just a way to 
keep the nodes busy until all the nodes complete the searches?

: case you could set --db-replicate-count equal to the number of
: fragments, which would be fine if per-node memory and disk space is
: substantially larger than the total size of the formatted database.
:

Is it possible in mpiblast that for cases where the size of the query 
dataset is equal to the size of target dataset, the query dataset 
should be fragmented, the target dataset should be kept in the 
global/shared area and searches are done on single nodes (the number 
of nodes equal to the number of query dataset fragments) and this way 
there would be no need to calculate the effective search space as all 
the search jobs get the same size of the target dataset? by following 
this way I managed to complete this job using standard blast in < 
24hrs.

: I just noticed that our documentation at mpiblast.lanl.gov doesn't
: include mention of the --db-replicate-count parameter.  I'm fairly
: certain it had been documented but lost when the mpiblast.org web 
server
: crashed and burned.  In any case, that command-line parameter allows 
you
: to control the degree of redundancy and load balancing that mpiblast
: will use on your cluster.
:
: In your particular situation, it may also help to randomize the 
order of
: sequences in the database to minimize "fragment hotspots" which 
could
: result from a database self-search.

I did not get the "fragment hotspots" bit here. By randomizing the 
order of sequence you mean each node would possibly take similar time 
to finish the searches? Otherwise it could be possible that the number 
of hits could be lower for some fragments than others and this ends up 
in different times for the job completion on different nodes?

: At the moment mpiblast doesn't have
: code to accomplish such a feat, but I think others (Jason Gans?) 
have
: written code for this in the past.

Aaron, do you think Score based mpi communication may be delaying the 
overall time in running mpiblast searches?

:
: -Aaron
:
:

----- For Lucas:

Hi Lucas,
thanks for your quick response. I run mpiblast in --debug mode earliar 
and that showed what mpiblast was doing at different stages and 
everything seems to be fine. when mpiblast --debug output got more 
thatn 2 GB and the mpiblast output was less than 500 MB, I killed that 
job and restarted it without --debug. I expect to get the complete 
output of my this job as 3.3Gb (which I obtained earliar using 
standard blast.

If you like, I can try finding out if that --debug file could be 
recovered or I may run mpiblast in --debug mode again.

I preferred -m 8 output switch since it is easiar to parse and it 
write less amount of text than standard blast output.

Score could be an issue, have you heard anyone having problems running 
mpiblast on Score?

Again, many thanks for the informative response from you both. hope to 
hear from you soon.

Regards,

Intikhab

-------------------------------------------------------------------------
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT & business topics through brief surveys-and earn cash
http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV
_______________________________________________
Mpiblast-users mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/mpiblast-users

Re: [Mpiblast-users] blast in 1 day but could not get mpiblast done even in 10 days for the same dataset

Reply via email to