Brief summary:
In r18190, I have restored the --do-not-launch capability, and added a
--do-not-resolve flag. This note describes how you can use those to build
and test application mappings without first getting an allocation and/or
launching it.

Longer description:

Users and developers have both expressed a need to develop potentially
complex process mappings "offline" - i.e., before attempting to actually
launch the application. This has been particularly problematic when the
mappings are large and target managed environments where obtaining an
allocation can take quite some time to clear the queue.

We used to have a "do-not-launch" flag that would allow the system to
allocate and map a job, but then exit without attempting to launch it. This
had been "disabled" during ORTE changes in recent months. We still had the
ability to "display-map" however, but the procedure would often hang or
abort as the system would attempt to resolve all network names in a
hostfile.

To resolve these problems, I have:

1. re-implemented the "do-not-launch" flag so it properly works. It is set
by specifying --do-not-launch on the mpirun command line

2. added a --do-not-resolve option to the mpirun command line that instructs
the system to not attempt to resolve network names


For an example of how these can be used, consider the case where you want to
build a sequential map of processes versus hostfile names via the new RMAPS
seq module. It will be a big job, so you would like to ensure that the map
is correct before (a) sitting in a queue for hours/days waiting to get an
allocation, and (b) finding out it is wrong and having to abort.

What you can do is use these new options to build and test your map
-without- getting an allocation by:

1. build a hostfile that describes your desired mapping - it would have a
list of host names in rank order of where you want a process to go. These
hosts can have any names - we won't be trying to resolve them, so the fact
that they are not necessarily reachable on the network is irrelevant.

2. do an mpirun of your job, including -mca rmaps seq -hostfile my_hosts
--do-not-launch --do-not-resolve --display-map on the cmd line. This
instructs mpirun to use the seq mapper, which will subsequently use the
specified hostfile to do the mapping. It also tells mpirun to display the
resulting map so you can see where your procs would have gone, but to not
attempt to find them on the network and to -not- attempt to launch the job.

What you'll get is a display node-by-node of what proc ranks are assigned to
that node. Once you get this looking the way you want, you can then simply
submit the job to your target cluster with confidence that the procs will be
mapped the way you wanted.


Hope that helps
Ralph


Reply via email to