Well, a few things to clarify the confusions of mpiblast-pio.

1. Differences between mpiblast-pio and pioblast

The techniques used in mpiblast-pio are based on the pioblast paper but
with some modifications and extensions. In the current release, only
parallel output has been incorporated but not parallel input.

In pioblast, the results of all queries are buffered in workers during the
search, and the output is done at the end of computation with COLLECTIVE
parallel write method. The collective parallel write can achieve
relatively good I/O performance on NFS.

In mpiblast-pio, the results of a query are outputted immediately after
the search of this query is completed (similar to the query pipelining
introduced by mpiblast1.3). The purpose of doing this is to reduce the
amount of buffered data and improve the scalability of searching large
query files. However, such a design prevents us to use collective parallel
write, because it requires synchronization of all processors for each
write operation. Therefore we use NON-COLLECTIVE individual parallel write
method. Unfortunately the non-collective parallel write is not well
supported by non-parallel file systems such as NFS in both correctness and
performance. So we designed a master write strategy to benefit users
without parallel file systems. This is the story of why there are two
output strategies in mpiblast-pio.

2. Performance of mpiblast-pio

mpiblast-pio offers two levels of parallelism in the output processing.
1) Converting ASN results to the desired output format in parallel. This
can be achieved with both parallel-write and master-write strategies.
2) Doing actual I/O in parallel. This is only available in the
parallel-write strategy.

mpiblast-pio improves the output performance mostly by parallelizing the
conversion of ASN results, which is performed serially on the master node
in mpiblast1.4. Therefore users can still benefit from the parallel output
enhancement when using the master-write strategy on NFS.

3. the limitation of master-write and ongoing work

Since with the master-write strategy, mpiblast-pio needs to buffer all the
results of a single query in the master before write them to the file
system, the largest output data of a query that can be processed will be
constrained by the master memory. Nevertheless a master node with 1G
memory can efficiently handle a query with multiple hundreds of megabytes
of output, which should meet the needs of most typical searches.

A more scalable master-write strategy is under development and will be
released in the near future. In the new design, results of a query will be
buffered in the aggregate worker memory and hence remove the above
mentioned memory constraint.


Hope this answers your questions gives some ideas about how mpiblast-pio
works.

Thanks,
Heshan Lin

> Hi All,
>
> I want to do some performance testing with running mpiblast on NFS.
> I have downloaded both mpiblast and mpiblast-pio from the website.
> According to the pioblast paper, the later should provide better
> performance.
> But the README file in mpiblast-pio recommends to use master-write
> for result processing with NFS, which seems to abate the benifit of
> pioblast.
> I wonder if this strategy is based on the performance consideration or
> correctness (consistency) consideration. Is there any way to get good
> performance with parallel output on NFS?
>
> Thanks a lot!
>
> Jiaying Zhang




-------------------------------------------------------
This SF.net email is sponsored by: Splunk Inc. Do you grep through log files
for problems?  Stop!  Download the new AJAX search engine that makes
searching your log files as easy as surfing the  web.  DOWNLOAD SPLUNK!
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=103432&bid=230486&dat=121642
_______________________________________________
Mpiblast-users mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/mpiblast-users

Reply via email to