Well, a few things to clarify the confusions of mpiblast-pio. 1. Differences between mpiblast-pio and pioblast
The techniques used in mpiblast-pio are based on the pioblast paper but with some modifications and extensions. In the current release, only parallel output has been incorporated but not parallel input. In pioblast, the results of all queries are buffered in workers during the search, and the output is done at the end of computation with COLLECTIVE parallel write method. The collective parallel write can achieve relatively good I/O performance on NFS. In mpiblast-pio, the results of a query are outputted immediately after the search of this query is completed (similar to the query pipelining introduced by mpiblast1.3). The purpose of doing this is to reduce the amount of buffered data and improve the scalability of searching large query files. However, such a design prevents us to use collective parallel write, because it requires synchronization of all processors for each write operation. Therefore we use NON-COLLECTIVE individual parallel write method. Unfortunately the non-collective parallel write is not well supported by non-parallel file systems such as NFS in both correctness and performance. So we designed a master write strategy to benefit users without parallel file systems. This is the story of why there are two output strategies in mpiblast-pio. 2. Performance of mpiblast-pio mpiblast-pio offers two levels of parallelism in the output processing. 1) Converting ASN results to the desired output format in parallel. This can be achieved with both parallel-write and master-write strategies. 2) Doing actual I/O in parallel. This is only available in the parallel-write strategy. mpiblast-pio improves the output performance mostly by parallelizing the conversion of ASN results, which is performed serially on the master node in mpiblast1.4. Therefore users can still benefit from the parallel output enhancement when using the master-write strategy on NFS. 3. the limitation of master-write and ongoing work Since with the master-write strategy, mpiblast-pio needs to buffer all the results of a single query in the master before write them to the file system, the largest output data of a query that can be processed will be constrained by the master memory. Nevertheless a master node with 1G memory can efficiently handle a query with multiple hundreds of megabytes of output, which should meet the needs of most typical searches. A more scalable master-write strategy is under development and will be released in the near future. In the new design, results of a query will be buffered in the aggregate worker memory and hence remove the above mentioned memory constraint. Hope this answers your questions gives some ideas about how mpiblast-pio works. Thanks, Heshan Lin > Hi All, > > I want to do some performance testing with running mpiblast on NFS. > I have downloaded both mpiblast and mpiblast-pio from the website. > According to the pioblast paper, the later should provide better > performance. > But the README file in mpiblast-pio recommends to use master-write > for result processing with NFS, which seems to abate the benifit of > pioblast. > I wonder if this strategy is based on the performance consideration or > correctness (consistency) consideration. Is there any way to get good > performance with parallel output on NFS? > > Thanks a lot! > > Jiaying Zhang ------------------------------------------------------- This SF.net email is sponsored by: Splunk Inc. Do you grep through log files for problems? Stop! Download the new AJAX search engine that makes searching your log files as easy as surfing the web. DOWNLOAD SPLUNK! http://sel.as-us.falkag.net/sel?cmd=lnk&kid=103432&bid=230486&dat=121642 _______________________________________________ Mpiblast-users mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/mpiblast-users
