Hmmm...well, things seem to work just fine for me:

[rhc@odin ~/ompi-hwloc]$ mpirun -np 2 -bynode -mca plm rsh hostname
odin090.cs.indiana.edu
odin091.cs.indiana.edu

[rhc@odin mpi]$ mpirun -np 2 -bynode -mca plm rsh ./hello_nodename
Hello, World, I am 1 of 2 on host odin091.cs.indiana.edu from app number 0 
universe size 8
Hello, World, I am 0 of 2 on host odin090.cs.indiana.edu from app number 0 
universe size 8


I'll get a fresh checkout and see if I can replicate from that...

On Nov 17, 2011, at 7:42 PM, George Bosilca wrote:

> I guess I reach one of these corner-cases that didn't got tested. I can't 
> start any apps (not even a hostname) after this commit using the rsh PLM (as 
> soon as I add a hostile). The mpirun is blocked in an infinite loop (after it 
> spawned the daemons) in orte_rmaps_base_compute_vpids. Attaching with gdb 
> indicates that cnt is never incremented, thus the mpirun is stuck forever in 
> the while loop at line 397.
> 
> I used "mpirun -np 2 --bynode ./tp_lb_ub_ng" to start my application, and I 
> have a machine file containing two nodes:
> 
> node01 slots=8
> node02 slots=8
> 
> In addition CTRL+C seems to be broken …
> 
>  george.
> 
> Begin forwarded message:
> 
>> Author: rhc
>> Date: 2011-11-14 22:40:11 EST (Mon, 14 Nov 2011)
>> New Revision: 25476
>> URL: https://svn.open-mpi.org/trac/ompi/changeset/25476
>> 
>> Log:
>> At long last, the fabled revision to the affinity system has arrived. A more 
>> detailed explanation of how this all works will be presented here:
>> 
>> https://svn.open-mpi.org/trac/ompi/wiki/ProcessPlacement
>> 
>> The wiki page is incomplete at the moment, but I hope to complete it over 
>> the next few days. I will provide updates on the devel list. As the wiki 
>> page states, the default and most commonly used options remain unchanged 
>> (except as noted below). New, esoteric and complex options have been added, 
>> but unless you are a true masochist, you are unlikely to use many of them 
>> beyond perhaps an initial curiosity-motivated experimentation.
>> 
>> In a nutshell, this commit revamps the map/rank/bind procedure to take into 
>> account topology info on the compute nodes. I have, for the most part, 
>> preserved the default behaviors, with three notable exceptions:
>> 
>> 1. I have at long last bowed my head in submission to the system admin's of 
>> managed clusters. For years, they have complained about our default of 
>> allowing users to oversubscribe nodes - i.e., to run more processes on a 
>> node than allocated slots. Accordingly, I have modified the default 
>> behavior: if you are running off of hostfile/dash-host allocated nodes, then 
>> the default is to allow oversubscription. If you are running off of 
>> RM-allocated nodes, then the default is to NOT allow oversubscription. Flags 
>> to override these behaviors are provided, so this only affects the default 
>> behavior.
>> 
>> 2. both cpus/rank and stride have been removed. The latter was demanded by 
>> those who didn't understand the purpose behind it - and I agreed as the 
>> users who requested it are no longer using it. The former was removed 
>> temporarily pending implementation.
>> 
>> 3. vm launch is now the sole method for starting OMPI. It was just too 
>> darned hard to maintain multiple launch procedures - maybe someday, provided 
>> someone can demonstrate a reason to do so.
>> 
>> As Jeff stated, it is impossible to fully test a change of this size. I have 
>> tested it on Linux and Mac, covering all the default and simple options, 
>> singletons, and comm_spawn. That said, I'm sure others will find problems, 
>> so I'll be watching MTT results until this stabilizes.
> 
> 
> _______________________________________________
> devel mailing list
> de...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/devel


Reply via email to