Very interesting! Appreciate the info. My numbers are slightly better
- as I've indicated, there is a NxN message exchange currently in the
system that needs to be removed. With that commented out, the system
scales roughly linearly with number of processes.
At 04:31 PM 7/28/2005, you wrote:
All,
I have removed the ompi_ignores from the new bproc components I have been
working on and they are now the default for bproc. These new components
have several advantages over the old bproc component but mainly:
- we now provide ptys support for standard IO
- it should work better with threaded applications(although this has not
been tested).
- We also now support Scyld bproc and old versions of LANL bproc using a
serial launch as opposed to the parallel launch used for newer bproc
versions. (Although I do not have a box to test this on so any reports on
how it works would be appreciated)
Their use is the same as before: set your NODES environment variable to a
comma delimited list of the nodes to run on.
The new launcher seems to be pretty scalable. Below are 2 charts where I
ran 'hostname' and a trivial mpi program on varying numbers of nodes with
both 1 and 2 processes per node (all times are in seconds).
Running Hostname:
Nodes 1 per node 2 per node
1 .162 .172
2 .202 .224
4 .243 .251
8 .260 .275
16 .305 .321
32 .360 .412
64 .524 .708
128 1.036 1.627
Running a trivial mpi process(Init/finalize)
Nodes 1 per node 2 per node
1 .33 .46
2 .44 .63
4 .56 .77
8 .61 .89
16 .71 1.1
32 .88 1.5
64 1.4 3.5
128 3.1 9.2
The frontend and nodes are dual Opteron 242 with 2 GB RAM and GigE.
I have been told that there are some NxN exchanges going on in the mpi
processes which are probably tainting the running time.
The launcher is split into 2 separate components. The general idea is:
1. pls_bproc is called by orterun. It figures out the process mapping and
launches orted's on the nodes
2. pls_bproc_orted is called by orted. This module initializes either a
pty or
pipes, places symlinks to them in well know points of the filesystem, and
sets up the io forwarding. It then sends an ack back to orterun.
3. pls_bproc waits for an ack to come back from the orteds, then does
several
parallel launches of the application processes. The number of launches is
equal to the maximum number of processes on a node.
Let me know if there are any problems,
Tim
_______________________________________________
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel