Hi Aaron, Thanks for your prompt response.
----- Original Message ----- From: "Aaron Darling" <[EMAIL PROTECTED]> To: "intikhab alam" <[EMAIL PROTECTED]>; <[email protected]> Sent: Wednesday, February 21, 2007 12:07 AM Subject: Re: [Mpiblast-users] blast in 1 day but could not get mpiblast done even in 10 days for the same dataset : I can think of two reasons why mpiBLAST may be suffering on this compute : job. I'll start with the less-likely problem first: If the database and : query sets contain amino acid sequences and are large, mpiblast 1.4.0 : can take a long time to compute the effective search space required for : exact e-value calculation. If that's the problem, then you would find : just one mpiblast process consuming 100% cpu on the rank 0 node for : hours or days, without any output. Is the effective search space calculation done on the master node? If yes, this mpiblast job stayed at the master node for some hours and then all the compute nodes got busy with >90% usage all the time with continued output being generated until the 12th day when I killed the job. : The trouble is that 1.4.0 doesn't : parallelize the effective search space calculation. I've cobbled a : workaround for this issue, which may be satisfactory if you can tolerate : some discrepancy in e-values between mpiblast and NCBI blast. See this : e-mail: : http://www.mail-archive.com/[email protected]/msg00177.html : : The more likely limiting factor is load imbalance on the cluster. In this case, do you think the job should finish on some nodes earliar than others? In my case job was running on all the nodes with >90% usage and the last output I got was on the last day when I killed the job. : If some database fragments happen to have a large number of hits and : others have few, and the database is distributed as one fragment per : node, then the computation may be heavily imbalanced and may run quite : slowly. CPU consumption as given by a CPU monitoring tool may not be : indicative of useful work being done on the nodes since workers can do a : timed spin-wait for new work. : I can suggest two avenues to achieve better load balance with mpiblast : 1.4.0. First, partition the database into more fragments, possibly two : or three times as many as you currently have. Second, use the You mean more fragments that inturn means to use more nodes? Actually at our cluster not more than 44 nodes are allowed for the parallel jobs. : --db-replicate-count option to mpiblast. The default value for the : db-replicate-count is 1, which indicates that mpiblast will distribute a : single copy of your database across worker nodes. For your setup, each : node was probably getting a single fragment. By setting Is it not right if each single node gets a single fragment of the target database (the number of nodes assigned for mpiblast = number of fragments+2) so that the whole query dataset could be searched against the fragment (effective search space calculation being done before starting the search for blast comparable evalues) on each single node? : --db-replicate-count to something like 5, each fragment would be copied : to five different compute nodes, and thus five nodes would be available : to search fragments that happen to have lots of hits. In the extreme You mean this way nodes would be busy searching the query dataset against the same fragment on 5 compute nodes? Is this just a way to keep the nodes busy until all the nodes complete the searches? : case you could set --db-replicate-count equal to the number of : fragments, which would be fine if per-node memory and disk space is : substantially larger than the total size of the formatted database. : Is it possible in mpiblast that for cases where the size of the query dataset is equal to the size of target dataset, the query dataset should be fragmented, the target dataset should be kept in the global/shared area and searches are done on single nodes (the number of nodes equal to the number of query dataset fragments) and this way there would be no need to calculate the effective search space as all the search jobs get the same size of the target dataset? by following this way I managed to complete this job using standard blast in < 24hrs. : I just noticed that our documentation at mpiblast.lanl.gov doesn't : include mention of the --db-replicate-count parameter. I'm fairly : certain it had been documented but lost when the mpiblast.org web server : crashed and burned. In any case, that command-line parameter allows you : to control the degree of redundancy and load balancing that mpiblast : will use on your cluster. : : In your particular situation, it may also help to randomize the order of : sequences in the database to minimize "fragment hotspots" which could : result from a database self-search. I did not get the "fragment hotspots" bit here. By randomizing the order of sequence you mean each node would possibly take similar time to finish the searches? Otherwise it could be possible that the number of hits could be lower for some fragments than others and this ends up in different times for the job completion on different nodes? : At the moment mpiblast doesn't have : code to accomplish such a feat, but I think others (Jason Gans?) have : written code for this in the past. Aaron, do you think Score based mpi communication may be delaying the overall time in running mpiblast searches? : : -Aaron : : ----- For Lucas: Hi Lucas, thanks for your quick response. I run mpiblast in --debug mode earliar and that showed what mpiblast was doing at different stages and everything seems to be fine. when mpiblast --debug output got more thatn 2 GB and the mpiblast output was less than 500 MB, I killed that job and restarted it without --debug. I expect to get the complete output of my this job as 3.3Gb (which I obtained earliar using standard blast. If you like, I can try finding out if that --debug file could be recovered or I may run mpiblast in --debug mode again. I preferred -m 8 output switch since it is easiar to parse and it write less amount of text than standard blast output. Score could be an issue, have you heard anyone having problems running mpiblast on Score? Again, many thanks for the informative response from you both. hope to hear from you soon. Regards, Intikhab ------------------------------------------------------------------------- Take Surveys. Earn Cash. Influence the Future of IT Join SourceForge.net's Techsay panel and you'll get the chance to share your opinions on IT & business topics through brief surveys-and earn cash http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV _______________________________________________ Mpiblast-users mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/mpiblast-users
