I can take a look at it - but it might be worth checking the trunk now as 
several related changes were committed over the last two days

On Sep 7, 2012, at 9:20 AM, Eugene Loh <eugene....@oracle.com> wrote:

> Maybe this is related to Reuti's "-hostfile ignored in 1.6.1" on the users 
> mail list, but not quite sure.
> 
> Let's pretend my nodes are called local, r1, and r2.  That is, I launch 
> mpirun from "local" and there are two other (remote) nodes available to me.  
> With the trunk (e.g., v1.9 r27247), I get
> 
>    % mpirun --bynode --nooversubscribe --host r1,r1,r1,r2,r2,r2 -n 6 
> --tag-output hostname
>    [1,0]<stdout>:r1
>    [1,1]<stdout>:r2
>    [1,2]<stdout>:r1
>    [1,3]<stdout>:r2
>    [1,4]<stdout>:r1
>    [1,5]<stdout>:r2
> 
> which seems right to me.  But when the local node is involved:
> 
>    % mpirun --bynode --nooversubscribe --host local,local,local,r1,r1,r1 -np 
> 4 --tag-output hostname
>    [1,0]<stdout>:local
>    [1,1]<stdout>:r1
>    [1,2]<stdout>:r1
>    [1,3]<stdout>:r1
>    % mpirun --bynode --nooversubscribe --host local,local,local,r1,r1,r1 -np 
> 5 --tag-output hostname
>    --------------------------------------------------------------------------
>    There are not enough slots available in the system to satisfy the 5 slots
>    that were requested by the application:
>      hostname
> 
>    Either request fewer slots for your application, or make more slots 
> available
>    for use.
>    --------------------------------------------------------------------------
> 
> I'm not seeing all the local slots I should be seeing.  We're seeing 
> wide-scale MTT trunk failures due to this problem.
> 
> There is a similar loss of local slots with hostfile syntax.  E.g.,
> 
>    % hostname
>    local
>    % cat               hostfile
>    local
>    r1
>    % mpirun --hostfile hostfile -n 2 hostname
>    --------------------------------------------------------------------------
>    A hostfile was provided that contains at least one node not
>    present in the allocation:
> 
>      hostfile:  hostfile
>      node:      local
> 
>    If you are operating in a resource-managed environment, then only
>    nodes that are in the allocation can be used in the hostfile. You
>    may find relative node syntax to be a useful alternative to
>    specifying absolute node names see the orte_hosts man page for
>    further information.
> 
>    --------------------------------------------------------------------------
> 
> The problem is solved with "--mca orte_default_hostname hostfile".
> _______________________________________________
> devel mailing list
> de...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/devel


Reply via email to