All, I have removed the ompi_ignores from the new bproc components I have been working on and they are now the default for bproc. These new components have several advantages over the old bproc component but mainly: - we now provide ptys support for standard IO - it should work better with threaded applications(although this has not been tested). - We also now support Scyld bproc and old versions of LANL bproc using a serial launch as opposed to the parallel launch used for newer bproc versions. (Although I do not have a box to test this on so any reports on how it works would be appreciated) Their use is the same as before: set your NODES environment variable to a comma delimited list of the nodes to run on.
The new launcher seems to be pretty scalable. Below are 2 charts where I ran 'hostname' and a trivial mpi program on varying numbers of nodes with both 1 and 2 processes per node (all times are in seconds). Running Hostname: Nodes 1 per node 2 per node 1 .162 .172 2 .202 .224 4 .243 .251 8 .260 .275 16 .305 .321 32 .360 .412 64 .524 .708 128 1.036 1.627 Running a trivial mpi process(Init/finalize) Nodes 1 per node 2 per node 1 .33 .46 2 .44 .63 4 .56 .77 8 .61 .89 16 .71 1.1 32 .88 1.5 64 1.4 3.5 128 3.1 9.2 The frontend and nodes are dual Opteron 242 with 2 GB RAM and GigE. I have been told that there are some NxN exchanges going on in the mpi processes which are probably tainting the running time. The launcher is split into 2 separate components. The general idea is: 1. pls_bproc is called by orterun. It figures out the process mapping and launches orted's on the nodes 2. pls_bproc_orted is called by orted. This module initializes either a pty or pipes, places symlinks to them in well know points of the filesystem, and sets up the io forwarding. It then sends an ack back to orterun. 3. pls_bproc waits for an ack to come back from the orteds, then does several parallel launches of the application processes. The number of launches is equal to the maximum number of processes on a node. Let me know if there are any problems, Tim