I can take a look at it - but it might be worth checking the trunk now as several related changes were committed over the last two days
On Sep 7, 2012, at 9:20 AM, Eugene Loh <eugene....@oracle.com> wrote: > Maybe this is related to Reuti's "-hostfile ignored in 1.6.1" on the users > mail list, but not quite sure. > > Let's pretend my nodes are called local, r1, and r2. That is, I launch > mpirun from "local" and there are two other (remote) nodes available to me. > With the trunk (e.g., v1.9 r27247), I get > > % mpirun --bynode --nooversubscribe --host r1,r1,r1,r2,r2,r2 -n 6 > --tag-output hostname > [1,0]<stdout>:r1 > [1,1]<stdout>:r2 > [1,2]<stdout>:r1 > [1,3]<stdout>:r2 > [1,4]<stdout>:r1 > [1,5]<stdout>:r2 > > which seems right to me. But when the local node is involved: > > % mpirun --bynode --nooversubscribe --host local,local,local,r1,r1,r1 -np > 4 --tag-output hostname > [1,0]<stdout>:local > [1,1]<stdout>:r1 > [1,2]<stdout>:r1 > [1,3]<stdout>:r1 > % mpirun --bynode --nooversubscribe --host local,local,local,r1,r1,r1 -np > 5 --tag-output hostname > -------------------------------------------------------------------------- > There are not enough slots available in the system to satisfy the 5 slots > that were requested by the application: > hostname > > Either request fewer slots for your application, or make more slots > available > for use. > -------------------------------------------------------------------------- > > I'm not seeing all the local slots I should be seeing. We're seeing > wide-scale MTT trunk failures due to this problem. > > There is a similar loss of local slots with hostfile syntax. E.g., > > % hostname > local > % cat hostfile > local > r1 > % mpirun --hostfile hostfile -n 2 hostname > -------------------------------------------------------------------------- > A hostfile was provided that contains at least one node not > present in the allocation: > > hostfile: hostfile > node: local > > If you are operating in a resource-managed environment, then only > nodes that are in the allocation can be used in the hostfile. You > may find relative node syntax to be a useful alternative to > specifying absolute node names see the orte_hosts man page for > further information. > > -------------------------------------------------------------------------- > > The problem is solved with "--mca orte_default_hostname hostfile". > _______________________________________________ > devel mailing list > de...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/devel